lisc-tools / lisc

Literature Scanner: Automated collection & analyses of the scientific literature.
https://lisc-tools.github.io/
Apache License 2.0
95 stars 11 forks source link

co-occurence search within the pre-filtered data in Articles #79

Closed guiomar closed 1 year ago

guiomar commented 2 years ago

Hi @TomDonoghue !

As always suped thankful for your very useful tools :)

I was wondering if you could do a co-occurence search, within the pre-filtered data in Articles. Or you would have to manage this throush the inclusion terms?

In this case for example:

Terms = ['word1','word2','word3'] --> Articles

TermsA = ['X'] InclusionA = ['x1','x2', Terms]

TermsB = ['Y'] InclusionA = ['y1','y2', Terms]

Terms: will look for 'word1' OR 'word2' OR 'word3' Whereas TermsA with InclusionA: will look for 'X' AND ('x1' OR 'x2' OR ('word1' OR 'word3' OR 'word3')) Or instead will it look for: 'X' AND (('x1' OR 'x2') AND('word1' OR 'word3' OR 'word3')) How it should be done to obain the latter case?

Thanks a lot!

TomDonoghue commented 2 years ago

Hey @guiomar - thanks for checking out the tool!

I'm not sure I totally understand what you mean by 'pre-filtered articles', but based on the example, I think for what you are trying to here, if I'm understanding correctly, you can do this using multiple terms lists. The Counts object can accept multiple terms lists (with their own inclusions & exclusions).

To get an example of something like: 'X' AND (('x1' OR 'x2') AND('word1' OR 'word3' OR 'word3'))

You can do:

from lisc import Counts

counts = Counts()

terms_a = [['X']]
incls_a = [['x1', 'x2']]

terms_b = [['word1', 'word2', 'word3']]

counts.add_terms(terms_a, dim='A')
counts.add_terms(incls_a, 'inclusions', dim='A')
counts.add_terms(terms_b, dim='B')

counts.run_collection(logging='print')

Note that this is a working example, and is using print logging to print out each URL that gets searched, which is generally a good way to see how the searches get built and check if it's working.

For example, the above code includes this search (which I think is what you want): https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=0&field=TIAB&retmode=xml&term=("X")AND("x1"OR"x2")AND("word1"OR"word2"OR"word3")

Does that cover what you were looking for here? If not, let me know!

TomDonoghue commented 1 year ago

hey @guiomar - I'm assuming this is now outdated, so I'm going to close this now - but if there is anything else / more, please feel free to reopen it / follow up!

guiomar commented 1 year ago

Hi @TomDonoghue! Thanks a lot! Indeed we managed to get what we wanted. My apologies for not communicating it. Always running. Thanks for your useful suggestions, and for all the awesome tools and resources you develop and openly share :)