Closed sanyabt closed 1 year ago
This is a known issue in Reach and is going to be addressed very soon. Once section titles are correctly extracted by Reach I will comment here to let you know. Thanks!
Hi @sanyabt, this was finally addressed in Reach a few days ago, see https://github.com/clulab/reach/pull/775. The new implementation requires some changes on the INDRA side as well: https://github.com/sorgerlab/indra/pull/1399. Once I merge those changes, we can close this issue, you just need to use the latest versions of both systems to get section names.
Awesome, thank you so much! I will look out for the commits.
Done in #1399
Hi @bgyori, I wanted to ask if extracting section_type only works with the "process_nxml_file" function in the REACH API? I've been using "process_text" with a local server which seems to be much faster than "process_nxml_file". The local server gets overloaded, however, when I try to run "process_nxml_file" to get the statements with section_type.
If you are reading plain text then process_text
is the way to go. However, for NXML-formatted content (which is the only input format that carries section information), you need to use one of the NXML-specific functions like process_nxml_file
. For a local web server you would call it like
from indra.sources import reach
rp = reach.process_nxml_file("input.nxml", url=reach.local_nxml_url)
Not sure what happens in terms of the Reach server getting "overloaded" - do you mean it returns slowly or crashes?
That's exactly how I was trying it! But after about 10 minutes I get the error - "ERROR: [2022-12-07 10:04:49] indra.sources.reach.api - Could not process NXML via REACH service.Status code: 503".
I see, if you attach or email me the NXML file, I can try it out to see what I get.
Oh, I think it's 2 of my nxml files that are problematic and not the function or server! I tested out on a batch of nxml files just to be sure and it is able to process all except the 2. Sorry about that and thank you for the quick responses!
Hi, I am trying to extract the section type from the INDRA statement evidence after processing with the REACH processor. However, the value is always null in the extracted statements. Is the section_type field assigned only in particular cases or do I need to set something for it? I can see the section_type in the code here but am not sure when it is assigned. I am using INDRA v1.19.0 and REACH v1.6.3. Thanks!