Closed maxplush closed 3 weeks ago
Good question.
Unfortunately, there's not enough info provided for me to provide a definitive answer. In general when working with github, you should post the links to files+line numbers rather than copy/paste the files. This helps ensure that extra context that may be necessary is easy to find and that there are no "stupid" errors resulting from copy/paste mistakes. In your case, I've tried to find your repo for you to look up the files, but your account doesn't have any public repos. Did you make it private? In general, I always recommend making all of your repos public (there are no downsides and only upsides), and for this class it is required for submission that the repos be public.
All that said, I think I know what the answer is. My guess is that your github repo contains many documents in the docs
folder, and so the for loop
for file in docs/*; do python3 docsumvim.py "${file}"; done`
is accessing those files and trying to read them. Some of those files (e.g. 2023.findings-emnlp.945.pdf
) are not valid UTF8 documents, however, and so you are getting that error reading them.
The short term fix to get things working is to ensure you only have the correct documents in the docs
folder.
The long term fix is to use the fulltext
library to open the files (which will convert e.g. from pdf to UTF8).
Got it thank you for pointing out what is best practice. I did have my repo(docsum-vim) as private, I just made it public. For the future I will include those code link's rather than snippets.
What I was trying to do is just get the tests passing for ONLY declaration.txt. I think I incorrectly commented out the line
for file in docs/*; do python3 docsumvim.py "${file}"; done`
I just deleted that line and [now it shows that it is passing on my repo. Thanks for flagging the fulltext library again, I used it previously but saw you were not using it in class so removed it. I'll go ahead and add that along with the chunking steps.
Struggling to get to the step of lecture yesterday at this point of the recorded lecture. For some reason my test cases are failing when I just have the deceleration.txt as a test(like in the video).
I tested the test command in my terminal line and it works fine. But this is the error I got when looking into why it failed.
This is my current tests.yml file code.
As you can see the command itself runs in the terminal.