Closed TuukkaOT closed 3 years ago
My computer crashed when I ran the code with the wikipedia dump, so I limited the app to iterate only the first 1000 lines for now. Also, the dump was >900GB, so github won't accept it. I changed the code so that user inputs a path to a plain text file that they have locally. Also, when you convert the file with xml2txt, remember to add -articles as a parametre, because otherwise the program cannot split the articles.
I tried to run our program with the Finnish Wikipedia corpus. I worked when the program just accessed the first 100000 lines. I removed that piece of code and tried with the whole thing but my computer wasn't able to handle it either. I was really hopeful since it ran for several minutes but sadly crashed at the end.
I don't know how to improve our program to handle bigger files with the capacity of our home laptops. I was just gonna ask this on Slack, but I saw you had gone there already Tuukka! 😄 Let's see if Mathias and Raul have any suggestions!
I'm just glad the problem wasn't my computer :) I added a progress bar for when the file is loading, so at least it's less frustrating to wait knowing that the program is still running.