materialsintelligence / matscholar-web

Code for the Materials Scholar website
http://matscholar.com
MIT License
9 stars 11 forks source link

moar abstracts! #68

Open computron opened 5 years ago

jdagdelen commented 5 years ago

@LeighWeston86 Let me know if you need anything on my end. Once you're ready I'll also need to update the Elasticsearch database with the new entries.

LeighWeston86 commented 5 years ago

Still working on it. Having some issue with MPI on Cori. I'll let you know when it's done.

AmalieT commented 5 years ago

We have 5 million of them now, I don't know if @computron is still feeling moarish about them

computron commented 5 years ago

I am still feeling moarish, I feel like a lot of the "important" ones must still be missing

e.g., a materials search for "thermoelectrics" still only shows 200 hits for PbTe. There must be like 10,000 in reality (Google scholar yields 18000 hits)

ardunn commented 5 years ago

yeah also rn afaik we do not have any Nature journals...

computron commented 5 years ago

@jdagdelen any thoughts?

jdagdelen commented 5 years ago

We actually have Springer Nature journals, but they aren’t showing up in the journals stats endpoint because that field is empty for about 2.5 million entries. I actually have been running a script the last few days to fix this which is nearly done.

jdagdelen commented 5 years ago

Also, I only see < 1,000 results for PbTe on Google scholar. After that it just shows blank pages. Maybe their definition of "result" is different than ours?

jdagdelen commented 5 years ago

Never mind. I think they only show 100 pages (1000 results) and then stop serving more.

ardunn commented 5 years ago

@jdagdelen yeah the rester only returns 100 dois per entity now which doesn’t make sense to me. Why do I not get all of them

Edit: whoops, totally misinterpreted what you wrote lol. disregard

On Mon, Oct 7, 2019 at 1:14 PM John Dagdelen notifications@github.com wrote:

Never mind. I think they only show 100 pages (1000 results) and then stop serving more.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/materialsintelligence/matscholar-web/issues/68?email_source=notifications&email_token=AEYDHSYUSQULSOJORG5AQB3QNOKDFA5CNFSM4IKMY2DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEARU4YY#issuecomment-539184739, or mute the thread https://github.com/notifications/unsubscribe-auth/AEYDHS6JA3LVZZMZRU57K3DQNOKDFANCNFSM4IKMY2DA .

ardunn commented 5 years ago
image

I mean google scholar says it has 52k results...

computron commented 5 years ago

yeah i meant "PbTe thermoelectric" has 18K results.

Anyway, regardless, it is undeniable that a lot of stuff is missing and it is due to missing abstracts in the database