tetherless-world / mowgli-etl

DARPA Machine Common Sense (MCS) Multi-modal Open World Grounded Learning and Inference (MOWGLI) Extract-Transform-Load sub-project
MIT License
6 stars 1 forks source link

Ellinj2 wdc refactoring #180

Closed ellinj2 closed 3 years ago

ellinj2 commented 3 years ago

Did some preliminary refactoring. Major differences are use of new data structures and removal of random file testing. I replaced the random testing infrastructure with a function that will yield random (still in order) entries from the large file as OffersCorpusEntry objects. Kind of hard to test a random process since I don't know what will be tested...

gordom6 commented 3 years ago

There was a random sample simply to avoid biasing toward a single site's entries grouped in the corpus. You could accomplish the same thing by jumping around in the corpus in a deterministic way.