mikeizbicki / cmc-csci181-languages

3 stars 4 forks source link

Tests.yml failing for declaration.txt #9

Closed maxplush closed 3 weeks ago

maxplush commented 3 weeks ago

Struggling to get to the step of lecture yesterday at this point of the recorded lecture. For some reason my test cases are failing when I just have the deceleration.txt as a test(like in the video).

I tested the test command in my terminal line and it works fine. But this is the error I got when looking into why it failed.

Screenshot 2024-09-05 at 12 32 50 PM

This is my current tests.yml file code.

name: tests

on:
  push:
    branches: ['*']
  pull_request:
    branches: ['*']

jobs:
  tests:
    strategy:
      matrix:
        python: [3.8, 3.9]
    runs-on: ubuntu-latest
    env:
      GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python ${{matrix.python}}
        uses: actions/setup-python@v2
        with:
          python-version: ${{matrix.python}}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
      - name: test declaration
        run: |
          python3 docsumvim.py docs/declaration.txt
          # The following line runs the script on all files in the docs directory
          for file in docs/*; do python3 docsumvim.py "${file}"; done`

As you can see the command itself runs in the terminal.

(venv) maxplush@Maxs-MacBook-Pro-4 docsum-vim % python3 docsumvim.py docs/declaration.txt 
Here is a summary of the text at a first-grade reading level:

A long time ago, the United States of America was first born. Some people, called the Founding Fathers, wrote a special letter to explain why they wanted to be free from another country, called England. They said everyone is equal and has the right to be happy. The king of England was being mean to them, and they didn't like it. He was making rules and taking away their rights. The Founding Fathers decided it was time to say "no" and make their own country. They wrote this special letter, called the Declaration of Independence, to tell everyone why they wanted to be free. They said they would do their best to make sure everyone is happy and safe, and they would fight for their freedom.
mikeizbicki commented 3 weeks ago

Good question.

Unfortunately, there's not enough info provided for me to provide a definitive answer. In general when working with github, you should post the links to files+line numbers rather than copy/paste the files. This helps ensure that extra context that may be necessary is easy to find and that there are no "stupid" errors resulting from copy/paste mistakes. In your case, I've tried to find your repo for you to look up the files, but your account doesn't have any public repos. Did you make it private? In general, I always recommend making all of your repos public (there are no downsides and only upsides), and for this class it is required for submission that the repos be public.

All that said, I think I know what the answer is. My guess is that your github repo contains many documents in the docs folder, and so the for loop

          for file in docs/*; do python3 docsumvim.py "${file}"; done`

is accessing those files and trying to read them. Some of those files (e.g. 2023.findings-emnlp.945.pdf) are not valid UTF8 documents, however, and so you are getting that error reading them.

The short term fix to get things working is to ensure you only have the correct documents in the docs folder.

The long term fix is to use the fulltext library to open the files (which will convert e.g. from pdf to UTF8).

maxplush commented 3 weeks ago

Got it thank you for pointing out what is best practice. I did have my repo(docsum-vim) as private, I just made it public. For the future I will include those code link's rather than snippets.

What I was trying to do is just get the tests passing for ONLY declaration.txt. I think I incorrectly commented out the line

for file in docs/*; do python3 docsumvim.py "${file}"; done`

I just deleted that line and [now it shows that it is passing on my repo. Thanks for flagging the fulltext library again, I used it previously but saw you were not using it in class so removed it. I'll go ahead and add that along with the chunking steps.