r03ert0 / brainspell

brainspell is a web platform to facilitate the creation of an open, human-curated, classification of the neuroimaging literature
10 stars 9 forks source link

blob at coordinates 0,0,0 #1

Open pbellec opened 9 years ago

pbellec commented 9 years ago

When running queries such as "Alzheimer" (~280 papers) or "disease" (~780 papers) there is one big blob that seems to be centered at coordinates 0,0,0. It is particularly clear with the "disease" query. It looks like a bug to me, either in the data or the software.

pbellec commented 9 years ago

The "MCI" query has a similar problem.

r03ert0 commented 9 years ago

Hi Pierre,

I found the problem. It's not really a bug in brainspell, but an issue with neurosynth's parser. There are often tables interpreted as stereotaxic coordinates which are in fact something else. I dumped all the coordinates for the papers that respond to search?q=alzheimer, and there are many like this one, for example:

http://brainspell.dev/article/22169204

which have coordinates like these:

15.0 65.0 20.0 8.0 40.0 40.0 6.4 41.0 37.0 4.7 28.0 45.0 3.7 24.0 44.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0

...of course, that makes a pile of rubbish at a coordinate close to 0,0,0, and that's not the only paper I found (ex.: http://brainspell.dev/article/19703569, http://brainspell.dev/article/18805495)

The only solution I see for the moment is manual curation... to go and tag all those tables as incorrect. I'm also working at implementing a way of manually editing/correcting the tables. That could also help.

Finally, the largest the list of articles responding to a query, the more likely is that you'll get wrong tables in the middle...

hope this helps.

best, roberto

On Mon, Oct 20, 2014 at 9:43 PM, Pierre Bellec notifications@github.com wrote:

The "MCI" query has a similar problem.

— Reply to this email directly or view it on GitHub.

pbellec commented 9 years ago

I see. I didn't realize it wasn't possible to manually edit the table. That would most definitely be useful. And we should let Tal know about this. As I told you, we are planning a brainspell Alzheimer/MCI tag sprint here in Montreal, and we could work on fixing at least these papers.

r03ert0 commented 9 years ago

the problem with manually editing the tables is how to deal with multiple editions (the worst case being vandalism). I think that as a 1st approach I'll just assume that people are good :)

On Mon, Oct 20, 2014 at 10:07 PM, Pierre Bellec notifications@github.com wrote:

I see. I didn't realize it wasn't possible to manually edit the table. That would most definitely be useful. And we should let Tal know about this. As I told you, we are planning a brainspell Alzheimer/MCI tag sprint here in Montreal, and we could on fixing at least these papers.

— Reply to this email directly or view it on GitHub https://github.com/r03ert0/brainspell/issues/1#issuecomment-59830976.

pbellec commented 9 years ago

Indeed. Also, it is unclear to me how you are going to keep the database updated. Will you merge with news releases from neurosynth ? If so, how will you deal with conflicts ?

Pierre Bellec http://simexp-lab.org/brainwiki/doku.php?id=pierrebellec Telephone (1) 514 713 5596 SIMEXP lab http://simexp-lab.org

On Mon, Oct 20, 2014 at 4:47 PM, Roberto Toro notifications@github.com wrote:

the problem with manually editing the tables is how to deal with multiple editions (the worst case being vandalism). I think that as a 1st approach I'll just assume that people are good :)

On Mon, Oct 20, 2014 at 10:07 PM, Pierre Bellec notifications@github.com

wrote:

I see. I didn't realize it wasn't possible to manually edit the table. That would most definitely be useful. And we should let Tal know about this. As I told you, we are planning a brainspell Alzheimer/MCI tag sprint here in Montreal, and we could on fixing at least these papers.

— Reply to this email directly or view it on GitHub https://github.com/r03ert0/brainspell/issues/1#issuecomment-59830976.

— Reply to this email directly or view it on GitHub https://github.com/r03ert0/brainspell/issues/1#issuecomment-59836963.[image: Web Bug from https://github.com/notifications/beacon/1670887__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyOTQ1NzIyMiwiZGF0YSI6eyJpZCI6NDYzNDMwNzF9fQ==--b3185123bdb76a2ebee0fd8980849a9e23e6c5c2.gif] {"@context":"http://schema.org","@type":"EmailMessage","description":"View this Issue on GitHub","action":{"@type":"ViewAction","url":" https://github.com/r03ert0/brainspell/issues/1#issuecomment-59836963","name":"View Issue"}}

r03ert0 commented 9 years ago

On Mon, Oct 20, 2014 at 10:50 PM, Pierre Bellec notifications@github.com wrote:

Indeed. Also, it is unclear to me how you are going to keep the database updated. Will you merge with news releases from neurosynth ? If so, how will you deal with conflicts ?

pbellec commented 9 years ago

At this stage I'd indeed be surprised if you ran into trouble by assuming "users are good". You could always add some moderation mechanism down the road.

Pierre Bellec http://simexp-lab.org/brainwiki/doku.php?id=pierrebellec Telephone (1) 514 713 5596 SIMEXP lab http://simexp-lab.org

On Mon, Oct 20, 2014 at 11:27 PM, Roberto Toro notifications@github.com wrote:

  • the DB is manually updated when more neurosynth data is made available (for the most recent papers, pubmed metadata is sometimes unavailable, so the DB needs frequent updates in any case)
  • ideally, neurosynth should be just one more user, and its updates considered as such (for many values, all votes/tags from all users are conserved). But for manually entered data, I think that human input should be given precedence over algorithmic input (such as neurosynth), conserving the 'flagging' mechanism to report errors. Finally, for humans overwriting humans or algorithms overwriting algorithms, I would apply the same principle as before, and assume that the last editor was correcting the previous one (i.e., that users are good).

On Mon, Oct 20, 2014 at 10:50 PM, Pierre Bellec notifications@github.com wrote:

Indeed. Also, it is unclear to me how you are going to keep the database updated. Will you merge with news releases from neurosynth ? If so, how will you deal with conflicts ?

— Reply to this email directly or view it on GitHub https://github.com/r03ert0/brainspell/issues/1#issuecomment-59874076.[image: Web Bug from https://github.com/notifications/beacon/1670887__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyOTQ4MTI3NCwiZGF0YSI6eyJpZCI6NDYzNDMwNzF9fQ==--a0904041a35969cb019bce5ce8112db23c9dbe91.gif] {"@context":"http://schema.org","@type":"EmailMessage","description":"View this Issue on GitHub","action":{"@type":"ViewAction","url":" https://github.com/r03ert0/brainspell/issues/1#issuecomment-59874076","name":"View Issue"}}

r03ert0 commented 9 years ago

is it ok to close this issue?

amanbadhwar commented 9 years ago

In regards to this issue, I am currently adding ~30 articles to brainspell, so I will echo the sentiment that 'my human' input is given precedence over an algorithm (at least for these articles).

Cheers, Aman