shaigue / pmi_masking

This repository contains code that takes a text corpus and creates a PMI masking vocabulary for it.
MIT License
1 stars 0 forks source link

reproduce results on wiki+bookcorpus #10

Open shaigue opened 1 year ago

shaigue commented 1 year ago

Rather have something like 200GB disk for that and at least 4 CPUs (expected time ~4 days, the more CPUs the faster) Since we have the original results, we can compare our results and have some sanity checks.

***UPDATE 22.06.2023 Create a bash script that runs this, and send to run on cluster.

***UPDATE 27.06.2023 Trying to run this on my machine does not seem to work, it runs out of disk. So we need to do that on the cluster

shaigue commented 1 year ago

Since I have the wikipedia bug, I want to do that with a different dataset of a similar size.