paracrawl / Domain_Adaptation

InDomain detection is a tool designed to extract in-domain data from a large collections of data.
GNU General Public License v3.0
1 stars 1 forks source link

Incorrect data: output corpus twice the size of input corpus #25

Closed wwaites closed 5 years ago

wwaites commented 5 years ago

Amir writes:

I manage to run it but the resulting corpus is almost twice the size of
actual corpus. So definitely not generating the correct data.
dalisola commented 5 years ago

can i see the command that you run please ?

dionwiggins commented 5 years ago

Resolved by recent update.