Handling duplicate statements in the various processors, or in the PySB assembler

sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.

http://indra.bio

BSD 2-Clause "Simplified" License

175 stars 66 forks source link

Handling duplicate statements in the various processors, or in the PySB assembler #20

Closed johnbachman closed 8 years ago

johnbachman commented 9 years ago

Queries of BEL or BioPax can (and do) result in multiple equivalent statements, as in the following. They should be handled in some way, either by incrementing the count, or perhaps adding to a list of papers/references/contexts or other information that may differ between the different instances.

bp = biopax_api.process_pc_neighborhood(['BRAF']) bp.get_phosphorylation() ... Phosphorylation(NRAS, RAF1, PhosphorylationThreonine, 268) Phosphorylation(NRAS, RAF1, PhosphorylationThreonine, 268)

bgyori commented 9 years ago

We don't check for duplicate statements but we do handle duplicate rules (that would be generated from duplicate statements) by displaying a warning in the assembler. I think this might be fine for now, especially since it is possible that different processors will collect 2 equivalent statements and pass it to the assembler. Currently the warning is triggered only when the automatically generated rule name is a duplicate. We should probably add a deeper equivalence check (whether the two rules really have the same pattern). Also, we could aggregate the evidence from all statements that were generated into a single unique rule.

johnbachman commented 8 years ago

Closed by the implementation of the Preassembler class as of, say, commit c48aff492a.