jameshowison / softcite

Study of software citation in the biology literature
5 stars 5 forks source link

pdf_to_text losing text? #7

Open jameshowison opened 9 years ago

jameshowison commented 9 years ago

2004-46-Nature:60

P31 resembles half of a P3 subunit and can be superposed almost equally well on either of the P3 jellyrolls (82 and 83 Ca atoms superposed for domains 1 and 2, respectively, in each case with an Figure 1 Architecture and structural components of bacteriophage PRD1.

Original full quote is: P31 resembles half of a P3 subunit and can be superposed almost equally well on either of the P3 jellyrolls (82 and 83 C atoms superposed for domains 1 and 2, respectively, in each case with an r.m.s. deviation of 2.9A as determined with program SHP [22] ), although there is no sequence similarity between them.

I find the rest of the sentence on line 76/77 : "articles r.m.s. deviation of 2.9 A˚ as determined with program SHP22), although there is no sequence similarity between them."

I've marked that by basically reducing the area included. i suspect that the crucial tip off for the coder is "as determined with program".

yg4886 commented 9 years ago

This is caused by both the pdf_to_text (Figure 1 Architecture and structural components of bacteriophage PRD1) and the sentences separation process. I am not sure how to handle this case. Let’s ask Byron.

On Nov 20, 2014, at 11:43 AM, James Howison notifications@github.com wrote:

2004-46-Nature:60

P31 resembles half of a P3 subunit and can be superposed almost equally well on either of the P3 jellyrolls (82 and 83 Ca atoms superposed for domains 1 and 2, respectively, in each case with an Figure 1 Architecture and structural components of bacteriophage PRD1.

Original full quote is: P31 resembles half of a P3 subunit and can be superposed almost equally well on either of the P3 jellyrolls (82 and 83 C atoms superposed for domains 1 and 2, respectively, in each case with an r.m.s. deviation of 2.9A as determined with program SHP [22] ), although there is no sequence similarity between them.

I find the rest of the sentence on line 76/77 : "articles r.m.s. deviation of 2.9 A˚ as determined with program SHP22), although there is no sequence similarity between them."

I've marked that by basically reducing the area included. i suspect that the crucial tip off for the coder is "as determined with program".

— Reply to this email directly or view it on GitHub https://github.com/jameshowison/softcite/issues/7.