wkiri / MTE

Mars Target Encyclopedia
Apache License 2.0
5 stars 0 forks source link

Update jsre_parser.py to skip jSRE if no records to classify #29

Closed wkiri closed 2 years ago

wkiri commented 2 years ago

This run caused 17 jSRE "no input file" errors when run on the full set of 1303 MER-A documents:

$ export JSON_FILE=/proj/mte/results/mer-a-jsre-v2-ads.jsonl
$ export JSRE_MODEL=/proj/mte/trained_models/jSRE-lpsc15-merged-binary.model
$ export NER_MODEL=/proj/mte/trained_models/ner_MERA-property-salient.ser.gz
$ python ../../git/src/lpsc_parser.py -li pdfpaths-$MISSION.list -o $JSON_FILE -jr /proj/mte/jSRE/jsre-1.1 -jm $JSRE_MODEL -n $NER_MODEL

[2021-09-01 11:00:07]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2004_2167.pdf [2021-09-01 11:00:25]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2004_2172.pdf [2021-09-01 11:01:09]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2004_2184.pdf [2021-09-01 11:01:38]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2004_2189.pdf [2021-09-01 11:03:36]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2005_1244.pdf [2021-09-01 11:04:15]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2005_1337.pdf [2021-09-01 11:05:11]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2005_1455.pdf [2021-09-01 11:14:08]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2009_1978.pdf [2021-09-01 11:16:23]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2010_2013.pdf [2021-09-01 11:23:02]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2013_1265.pdf [2021-09-01 11:23:51]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2013_1674.pdf [2021-09-01 11:26:04]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2014_1518.pdf [2021-09-01 11:26:13]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2014_1590.pdf [2021-09-01 11:33:18]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2018_1895.pdf [2021-09-01 11:33:31]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2018_2286.pdf [2021-09-01 11:34:57]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2019_2322.pdf [2021-09-01 11:36:53]: LPSC parser failed: /proj/mte/data/corpus-lpsc/mer-pdf/2020_2783.pdf

wkiri commented 2 years ago

@stevenlujpl I went hunting and found another lpsc_parser.py process running on mlia-compute with my username (!). I can't find where it was running from (it was not in any active screens), but I terminated the process. I suspect this was causing the failures above. I will re-run and let you know if I see any further issues.

wkiri commented 2 years ago

@stevenlujpl This run completed with no errors. I suspect that the other process caused the errors above, so please do not worry about spending time on them. I think that the open() call must be creating an (empty) input file for jSRE so we do not see an error even if records is empty. I think it would be nice to skip the call to jSRE in that case, but it is not a critical change at this time. If you agree, feel free to close this issue. If however it is an easy addition to skip jSRE for empty records, I think it is a nice update and might shorten runtime a bit.

stevenlujpl commented 2 years ago

I confirmed that io.open() call will create an empty file when the records variable is empty, and I also added the check to skip jSRE call if there is no target-element and target-mineral record.

wkiri commented 2 years ago

Excellent!