Closed anhduc114 closed 4 years ago
Sure, thanks for telling me about this! Does the same thing happen when you run the test URLs included in the repo in the /tests directory? This is a subset of URLs that I know should work.
newsroom-scrape --urls tests/urls.txt --archive tests.archive
newsroom-extract --archive tests.archive --dataset tests.data
I've also heard other people having trouble using the repo with Python 3.5 or lower. What version of Python and the repository dependencies are you using (pip freeze
)?
beautifulsoup4
click
nltk
readability-lxml
requests
tqdm
numpy
ujson
spacy
I'll see if I can replicate what you're seeing on my machine.
it seems that the issue appears when i try to extract the dev.archive file without having it downloading all the data. I downloaded the full dev.archive and then, extraction commands went well.
when i run this command newsroom-extract --archive dev.archive --dataset dev.data to obtain the data file, the command line shows these message: "Loading downloaded summaries: Aborted!" Then it stops. Can you guys help fix this? Ty