medvidov / IMaSC

Intelligent Mission and Scientific Instrument Classification. Applying unique NLP approaches to improve information extraction through scientific papers/Foundry A-Team Studies.
Apache License 2.0
0 stars 1 forks source link

Determine what to label/not label #17

Closed medvidov closed 4 years ago

vc1492a commented 4 years ago

@medvidov make sure to add your issues to the appropriate milestone, project, and add labels so that they can be used effectively in the project board! Thanks.

vc1492a commented 4 years ago

@medvidov added it to the project and appropriate milestone for you. Please be sure to add those two at least as if you don't, they won't show up in the project board!

As for what to annotate / not to annotate in the provided dataset, let's start with these broad two classes:

The Microwave Limb Sounder (MLS) publications include a set of named instruments and missions related to Earth science, specifically atmospheric sciences. First task is to take a look at a random sample of those documents - say 30 - 50 of them - and decide which instruments and spacecraft / missions we want to extract from the dataset.

Feel free to start building that list below. Ping me here once you have a list for both together! We can review and go from there.

vc1492a commented 4 years ago

Analysis of CO in the tropical troposphere using Aura satellite data and the GEOS-Chem model: insights into transport characteristics of the GEOS meteorological products.pdf

See some of the examples in the PDF!

Also, the JPL project reference list (PRL) may help!

vc1492a commented 4 years ago

Field in the JSON is called text. There's a script you can use to read each document in the JSON, and then then field that is relevant is called text.

medvidov commented 4 years ago

I have no account for the PRL, but I will go through and read this article/a few others and see what it makes sense to annotate!

vc1492a commented 4 years ago

I think you just need to be on the VPN to access PRL - are you having trouble accessing?

medvidov commented 4 years ago

I think a third class that could/should be added would be models, making what I would annotate be the following:

medvidov commented 4 years ago

Will add models if the model is working for instruments and spacecraft.