petermr / openVirus

aggregation of scholarly publications and extracted knowledge on viruses and epidemics.
The Unlicense
67 stars 17 forks source link

creation of dictionaries from wikipedia pages #9

Open petermr opened 4 years ago

petermr commented 4 years ago

petermr commented 4 years ago

Material taken from https://github.com/petermr/tigr2ess/blob/master/dictionaries/TUTORIAL.md This tutorial was for food crops but should translate to other topics

petermr commented 4 years ago

strategy

Dictionary strategy from: https://github.com/petermr/tigr2ess/blob/master/dictionaries/TUTORIAL.md

example for plants

ami-dictionary create --informat wikipage --input https://en.wikipedia.org/wiki/Ocimum_tenuiflorum --dictionary otenuiflorum --directory mydictionaries --outformats xml,html

wikipedia page

 sars_covid_2

https://en.wikipedia.org/wiki/Severe_acute_respiratory_syndrome_coronavirus_2

cd projects/openVirus
ami-dictionary create --informat wikipage --input https://en.wikipedia.org/wiki/Severe_acute_respiratory_syndrome_coronavirus_2 --dictionary sars_covid_2 --directory dictionaries --outformats xml,html

NOTE: takes about 1 sec for each entry due to individual downloads. No immediate output to disk until everything is finished.

creates:

/Users/pm286/projects/openVirus/dictionaries/sars_covid_2

with 123 entries (some are irrelevant and will need editing out).

pandemic

ami-dictionary create --informat wikipage --input https://en.wikipedia.org/wiki/2019–20_coronavirus_pandemic --dictionary 201920_covid_pandemic --directory dictionaries --outformats xml,html