jusinowicz / metascrape

A pdf-scraping workflow for scientific meta analysis
MIT License
0 stars 0 forks source link

Do something useful with the abstract_table.csv #1

Open jusinowicz opened 1 month ago

jusinowicz commented 1 month ago

Using the preliminary output gleaned by scraping abstracts with the NER, create R code that helps sort, visualize, and prioritize papers to follow up on.

jusinowicz commented 1 month ago

Working on R code in metascrapte/analysis_in_R/abstract_table_summary.R

  1. Counts unique words in each of the columns (which map onto NER labels)
  2. Coarse visualizations as bar plots and word clouds.

Improvement ideas:

  1. Filter NULL/NA/NaN entries
  2. Filter out all singletons (phrases which have only one entry) OR clever ways to group singletons?
  3. In certain cases, phrases can be grouped by shared root words. E.g. in INOCTYPE: AMF vs. AM vs. arbuscular m fungi vs fungus vs....etc.
  4. Deal with case. Make things case insensitive.

Still need to think of how exactly to use this to help target or organize database expansion.

jusinowicz commented 1 month ago

That phase was successful, but what next?