paracrawl / Domain_Adaptation

InDomain detection is a tool designed to extract in-domain data from a large collections of data.
GNU General Public License v3.0
1 stars 1 forks source link

Order documentation for first use with example #19

Closed kpu closed 4 years ago

kpu commented 5 years ago
  1. What is this tool (which you have here already; good)
  2. Dependencies
  3. Installation including specifying paths to dependencies
  4. Setup parallel corpora. Example of downloading ParaCrawl data.
  5. Specify my domain. Pick I dunno WMT biomedical or something.
  6. Running the top-level tool.

Then after I've done my first run successfully you can tell me about how the tool is implemented internally with various scripts.