n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
284 stars 56 forks source link

Download music/books data in german version #76

Open suryapa1 opened 4 years ago

suryapa1 commented 4 years ago

prepare_cls.py:

Could you share public URL to fetch cls books/music in german version please ??

def fetch_cls(url_prefix, cls_path="data/cls"): """ Fetch CLS from server using basic auth url_prefix should point to CLS stored as follow "https://user:passwd@server/path/[en|fr|de|jp]/[dvd|music|books].[test|train|unlabeled].csv" data/cls/de-music/models/sp15k """ def fetch(url, CLS): CLS.parent.mkdir(parents=True, exist_ok=True) print("fetching", url, CLS) urllib.request.urlretrieve(url, CLS) for code in lang_codes: for category in [ 'music']: dir = Path(cls_path)/f'{code}-{category}' fetch(f"{url_prefix}/{code}/{category}/train.csv", dir / f"{code}.train.csv") fetch(f"{url_prefix}/{code}/{category}/test.csv", dir / f"{code}.test.csv") fetch(f"{url_prefix}/{code}/{category}/unlabeled.csv", dir / f"{code}.unsup.csv")

if name == "main": fire.Fire(fetch_cls)