sillsdev / cog

Cog is a tool for comparing languages using lexicostatistics and comparative linguistics techniques.
http://sillsdev.github.io/cog/
MIT License
22 stars 10 forks source link

Abusing COG with an obnoxious amount of data. #98

Closed MattGyverLee closed 9 months ago

MattGyverLee commented 12 months ago

I'm not sure if this is a congrats or an issue. I am definitely not using Cog as directed.

I'm working on a data project and I gave Cog over a thousand words in five hundred (not-necessarily related) languages (the .cogx file is surprisingly only 15MB).

I tried this twice. First, I pasted the file and it took about 10 minutes after uploading the TSV before it woke up. I noticed it was using 8G of RAM, and considered it might have been because some failed pastes. I saved and closed, but I gave up after half an hour of "loading views".

Last night, I re-imported and clicked on compare all varieties last night, put my computer to sleep and went to bed. This morning, I woke it back up, and it's still going. image

I've been watching Cog use over 10GB of RAM to do its work, and so far it hasn't crashed, which is impressive. I'm not sure I'll be waiting 48 days...but...

Let me know if you want the file to do benchmark testing.

ddaspit commented 11 months ago

That is way beyond what Cog was designed to handle. I've never tried to run a dataset that large before. I'm glad it hasn't crashed yet, but it could take a really long time to complete (depending on your raw CPU power).