wkiri / MTE

Mars Target Encyclopedia
Apache License 2.0
5 stars 0 forks source link

Update ingest_sqlite.py to remove LPSC-specific content #18

Closed wkiri closed 2 years ago

wkiri commented 2 years ago

To support non-LPSC content (like journals), we will need to update these functions:

https://github.com/wkiri/MTE/blob/314723342febecd90ccac5a6581c6af9353407d3/src/ingest_sqlite.py#L93-L113

stevenlujpl commented 2 years ago

@wkiri I added a command line argument -v or --venue to the ingest_sqlite.py script to make populating LPSC-specific content optional. The -v or --venue argument accepts a string from the list [lpsc, others], and the default is others (which means doing nothing to the doc_url and venue fields). If we run the ingest_sqlite.py with -v=lpsc or --venue=lpsc option, then the fields doc_url and venue will be populated by calling the construct_doc_url() and update_doc_venue() functions. For example,

python ingest_sqlite.py lpsc.jsonl  -d lpsc.db -v lpsc
stevenlujpl commented 2 years ago

@wkiri Please let me know if there is any question or suggestion. If not, I think we can close this issue. Thanks.

wkiri commented 2 years ago

I ran this on PHX and MER-A content and it works as expected. Thanks!

wkiri commented 2 years ago

PHX was run before the ADS updates so 'venue' is empty if run without -v lpsc.