Open ay-amityadav opened 5 years ago
Thanks very much! This is a good analysis. I think for the tutorial we will stick with fullDataTables.html. The others are experiments to see who might find them useful.
Please transfer to bottom of ami-search tutorial
ami-version
ami20190218c
. The discussion below follows on the output of the commandami-search-new -p ./ami20190218c/osanctum200/ --dictionary country plantparts
.GLOBAL Let's first look at the four major files generated. I will talk about them in the order it makes the testing process easier.
full.dataTables.html
. This file contains the columns:articles
,bibliography
,dic:country
,dic:plantparts
,word:freqeuncies
. Apart from the bibliography column (mentioned at issue 48), I think we should explicitly mention count 1, such asIndia * 1
, otherwise it's confusing. Probable TODO I havent't looked at the correctness of the counts in the table by manually inspecting a paper. Problem: The columnword:frequencies
doesn't seem to be sorted in any way. If sorted by counts, it would help the latter testing procees, and is probably the best way here.commonest.dataTables.html
. This file also contains the same columns as the above. Test: the entries in a particular cell for the columnsdic:country
,dic:plantparts
,word:frequencies
is the maximum of the values present in the corresponding cell infull.dataTables.html
. The file passes this test. Similarly as above, we can display count 1 such asIndia *1
count.dataTables.html
. First the columnsdic:country
anddic:plantparts
seem to contain the total number of words present in a paper from the respective dictionaries. The values are close to the sum of counts present in a particular cell in thefull.dataTables.html
(the values are not equal sincefull.dataTables.html
apparently shows only the top 5 counts in a cell). The Problem lies with the columnword:frequencies
where the value in a cell is less than the value in the corresponding cell ofcommonest.dataTables.html
. Please confirm thisentries.dataTables.html
. Test: The columnsdic:country
anddic:plantparts
contains the numbe of different terms present in the paper from the respective dictionaries. Based on the entries infull.dataTables.html
for the respective dictionaries, the values in the concerned file seem correct. Problem: same as withcount.dataTables.html
, columnword:frequencies
where the value in a cell is less than the value in the corresponding cell ofcommonest.dataTables.html
. Please confirm thisI suggest we create a separate folder for the above four files, probably named
tables
.Files created in osanctum200 directory related to
country
dictionary:search.country.count.xml
- no informationsearch.country.documents.xml
- no informationsearch.country.snippets.xml
- at the moment looks ok, requires more carefull look I suggest we create a separate folder for the above files namedcountry
Files created in osanctum directory related to
plantparts
dictionary:search.plantparts.countxml
- no informationsearch.plantparts.documents.xml
- no informationsearch.plantparts.snippets.xml
- at the moment looks ok, requires more carefull look I suggest we create a separate folder for the above files namedplantparts
Files created in osanctum directory related to
word frquencies
:word.frequencies.count.xml
- no informationword.frequencies.documents.xml
- no informationword.frequencies.snippets.xml
- at the moment looks ok, requires more carefull look I suggest we create a separate folder for the above files namedword_frequencies
PAPER SPECIFIC
Let's look inside the
PMC1397864
folder.search.country.count.xml
- no informationsearch.country.snippets.xml
- at the moment looks ok, requires more carefull looksearch.plantparts.count.xml
- no informationsearch.plantparts.snippets.xml
- at the moment looks ok, requires more carefull lookword.frequencies.count.xml
- no informationword.frequencies.snippets.xml
- Problem: name of the file needs to be changed, it doesn't contain any snippets, other things look. Test This file and the corresponding cell in the columnword:freqeuncies
infull.dataTables.html
should agree each other.We can probably club the above files together in a folder
The other three files:
eupmc_result.json
,fulltext.xml
andscholarly.html
look ok to me. We can probably club them together in a folderresults
: Need to look at this folderCooccurrence NOTE: On this version, I get a folder named
__cooccurrence
, instead ofcooccurrence
. Need to look at this.Please find the files related to this issue at: https://github.com/petermr/tigr2ess/tree/master/problems/amit/issue