neuroelectro / neuroelectro_org

The NeuroElectro Project: Compiling information on neuron electrophysiology through literature text-mining.
neuroelectro.org
GNU General Public License v2.0
13 stars 5 forks source link

Implement a text-mining method for identifying "n" in data tables #54

Open stripathy opened 9 years ago

stripathy commented 9 years ago

e.g. http://dev.neuroelectro.org/neuroelectro/data_table/200/

We discussed making this a special experimental factor or metadata, but this needs to be considered carefully.

screenshot from 2015-07-08 16 33 41

stripathy commented 9 years ago

it'd be good to collect a few more examples of this

stripathy commented 9 years ago

another common example is to annotate a neuron type like "FS (n = 10)"

stripathy commented 9 years ago

there's code for adressing a lot of this in https://github.com/neuroelectro/neuroelectro_org/blob/master/article_text_mining/html_table_decode.py

like this handy function

def parensResolver(inStr):
    parensCheck = re.findall(u'\(.+\)', inStr)
    insideParens = None
    if len(parensCheck) > 0:
        insideParens = parensCheck[0].strip('()')
    newStr = re.sub(u'\(.+\)', '', inStr)
    return newStr, insideParens
stripathy commented 9 years ago

try this one too: http://dev.neuroelectro.org/data_table/807/

stripathy commented 9 years ago

another good example for this: http://dev.neuroelectro.org/data_table/127/

dtebaykin commented 9 years ago

For now: deal with single N per neuron type

2 cases to handle on the backend: 1) N in the header - annotate it as metadata and give it value 2) N in cell - annotate N, leave value blank, the backend has to parse the values in the table and attach them to the correct nedm.

Future: 1) Add N and Standard Error to the exp factor list 2) Annotate column/row with N or SE 3) Propagate N or SE value through row and column for each nedm 4) If N or SE share neuron_cm with an nedm - assign them to that nedm