rockt / SETH

SNP Extraction Tool for Human Variations
rockt.github.com/SETH
Other
27 stars 16 forks source link

Incorrect rsIds and no extracted rsIds #17

Closed spenagon closed 7 years ago

spenagon commented 7 years ago

We have two issues with incorrect extraction and no extracted rsIds:

docId: 17678724 " Two polymorphisms in MHC2TA gene (rs4,774G/C and rs3,087,456A/G) were studied in two groups" extracted: rs4 and rs3

docId: 22419714 "Patients carrying the TCF7L2_rs7903146_T allele had an increased risk of CRC (P(trend) = 0.02), whereas patients harboring the IL13_rs20541_T allele had a reduced risk (P(trend) = 0.02)"

also we saw some cases of rsIds preceded of ":"

Erechtheus commented 7 years ago

Thanks for reporting. I added some tests into the project, which can be found here: https://github.com/rockt/SETH/blob/master/src/test/java/de/hu/berlin/wbi/issues/Issue17Test.java

I also added a Unit-test with rsId's containing a preceeding colon.