lizzieinvancouver / egret

1 stars 0 forks source link

Data entry loose ends #6

Open DeirdreLoughnan opened 8 months ago

DeirdreLoughnan commented 8 months ago

Thank you everyone for your great work entering data. We are so close to being done, but there seems to be some loose ends still. Please review the below issues with your assigned papers and either update your source tabs with the required information or send me an email regarding issues you are still having (ie still waiting on ILL) by next Wednesday Oct 25.

But I can't find data or justification for why data was not scraped for the following papers:

@buniwuuu: lee93

@christophe-rd: ikeda01

@kim-dajeong: barnhill82 jensen09 joshi03

@soleil-nocturne123: downie98 edwards96 elisafenko15 fetouh14 feurtado05

@ngoj1: can you confirm that you were simply not able to get pdf copies of the following: yahyaoglu06 Tashev08 racek07 povoa09 tanuja20

@ngoj1 It also appears that datasetID's were incorrectly copied for a few of your studies (moeini21 and sundaramoorthy93), could you double check and correct these accordingly?

@toluam: it says in your source tab that the following required ILL. But I still don't see any data. Are we still waiting on these ILL? Chien09 Chung91 Denny04 Kazinczi98 Morozowska02 Morris16 Mughal07 Mughal10 Mulaudzi09 Mutele15 Nasri14 Santos19 Schafer89

@toluam the new egret Drive folder also does not appear to have any of your pdf's added. Do you still have copies of the pdf's you scraped? Could you add them if you do.

ngoj1 commented 8 months ago

Just fixed the year numbers for those two papers. Those other 5 are papers that I couldn't track down to a source and I don't recall hearing back about ILLs made for them ever, but I'll dig again to see if I just missed them and then resubmit ILL requests and hope that even if I don't have a verifiable source, they library will still be able to find the paper.

Downie98 and edwards96 from Sylvia were passed on to me to scrape, which are my two most recent papers.

DeirdreLoughnan commented 8 months ago

@ngoj1 Awesome, thanks for getting to this so quickly! If you have already submitted an ILL for them, I don't think it is worth your time doing one again. But if you are not sure an ILL was ever made, making one now sounds great.

ngoj1 commented 8 months ago

I've just made the ILLs for the ones I couldn't find, hopefully the library can figure something out. I might have just forgotten to send ILLs for these, so I'm so sorry about that! In my experience they always got back to me within a week.

I didn't scrape tanuja20 because they called the plant a medicinal herb, and to my knowledge Polygonatum verticillatum is cultivated to some degree. I might be hallucinating but I can't remember what the verdict was on scraping papers for medicinal plants.

I also looked over my source tab since Tolu passed some papers to me to scrape;

buniwuuu commented 8 months ago

lee93 is in Korean! Just pushed the updated file. I also need to scrape elisafenko15 and maithani90 from Sylvia. I will get them done by the end of the weekend!

soleil-nocturne123 commented 8 months ago

lee93 is in Korean! Just pushed the updated file. I also need to scrape elisafenko15 and maithani90 from Sylvia. I will get them done by the end of the weekend!

Maithani90 was at WW the first time I looked but wasn't there anymore a few times I came back to take the scan

ngoj1 commented 8 months ago

Update:

DeirdreLoughnan commented 8 months ago

@ngoj1 @buniwuuu @soleil-nocturne123 Thanks everyone for working on this!

@ngoj1 keep me updated on the ILL, hopefully you hear back soon!

Regarding Polygonatum verticillatum, I think we should scrape, but add a note that it is a medicinal plant.

  • chien09 was scraped by me with no issues

I still don't see data from chien09 in your data, am I looking at the right file: egret_JN_18.10.2023.xlsx?

  • kazinczi98 is in Hungarian so no scrape

Could you update the language column in your source tab with this (and do the same for wang15 and yahyaoglu06)? In general all notes about language should be made in this column.

I am sure I will have more questions this week and appreciate all your help getting this tied up!

DeirdreLoughnan commented 8 months ago

@ngoj1 I have a few more questions for you regarding egret papers and since your tasks are wrapping up, I was hoping I could get your help doing some ILLs.

It seems like some of Tolu's papers were reassigned to you. Do you know anything about the status of: denny04 kazinczi98 masoomeh09 morozowska02

Could you help me track down some other papers? There are notes that these were not easy to find, but it is unclear whether ILL were ever made. mulaudzi09 nasri14 santos19 liu04 liu09 markovic20 tashev08 weerakoon81 wagner07 zhangNA zhaoNA (there may be insufficient source information to find these last two)

For now you can just try to get pdf's of these papers, we can discuss who has time to scrape them later.

Thanks for your help!

kim-dajeong commented 8 months ago

@DeirdreLoughnan Sorry I didn't realise those three papers were mine because the crops column was already filled out and I didn't look closely enough :( I've submitted ILLs for them now, although one of them (barnhill82) doesn't seem to exist...

DeirdreLoughnan commented 8 months ago

@kim-dajeong thanks for looking into them. If the data is for crops, then we don't want to scrape it. But it is still important to update your source tab, changing the A to and R in the "accept_reject" column and the "reason_reject" to something like "crop" or "paper not available with ILL" etc.

When you say it does not exist, what do you mean? Did you submit an ILL and they could not find it?

ngoj1 commented 8 months ago

Hi Deirdre,

I accidentally wrote the dataset ID as chen08 instead of chien09, just pushed that. Sorry for the confusion!

I also just noticed I had li11 as "cannot find", but now that I know Woodward can manage to pull these from the depths of somewhere I've just submitted an ILL and will see if they can get it for me even if there's no source to go from.

As for Tolu's papers, I only recognize denny04 and kazinczi98 from those four; kazinzci98 is in Hungarian (updated in my source tab), and for denny04 I remember going to Woodward with her to split up and look for the papers separately so I'll text her and see if she has the document. I'll also ask her about masoomeh09 and morozowska02.

For that list of hard-to-find papers:

kim-dajeong commented 8 months ago

@kim-dajeong thanks for looking into them. If the data is for crops, then we don't want to scrape it. But it is still important to update your source tab, changing the A to and R in the "accept_reject" column and the "reason_reject" to something like "crop" or "paper not available with ILL" etc.

When you say it does not exist, what do you mean? Did you submit an ILL and they could not find it?

I was looking for the journal volume online but it didn't seem to exist, but I went to woodward and they had the volume from 1982 so I'll be scraping that this week.

DeirdreLoughnan commented 8 months ago

@kim-dajeong That is great to hear! I am glad you were able to find it!

@ngoj1 Thanks for your help tacking down these last few elusive papers!

ngoj1 commented 8 months ago

Hi Deirdre,

Tolu just told me that Mulaudzi09 is already scraped and in her most recent egret data file (24/10/2023). Masoomeh09 and Morozowska02 are available through ILL, but it's already past the deadline for receiving them so I just put in new ones for them. Now that I recall, Denny04 was one of those that only showed the abstracts. I found the original "article" here, for reference: https://journals.ashs.org/hortsci/view/journals/hortsci/39/4/article-p787E.xml?rskey=P8ueEt&result=1&tab_body=pdf

Tolu also already scraped nasri14 and and santos19 in her data sheet. As for the others:

DeirdreLoughnan commented 8 months ago

@ngoj1 Thanks for your help with this!

All our papers simply came from a search of web of science for our specified terms. If there is not sufficient data to find ZhangNA and ZhaoNA, that is fine. That was the case for a few other papers as well. It might be they were not publications, but posters or conference proceedings that snuck in our list somehow.

It sounds like we are just waiting on ILL for: Masoomeh09 Morozowska02 tashev08 weerakoon81

Once you have the pdf's let me know and I will follow up with assignments for scraping them.

soleil-nocturne123 commented 8 months ago

@DeirdreLoughnan Sorry for the late response. Regarding the following papers,

  1. downie98 & edwards96: Justin helped me with this but if I remember correctly, the figures weren't really great so we decided not to scrape them

  2. elisafenko15: I asked Britany to help me with this paper but could you skim through it to see if we should scrape this paper

  3. fetouh14: I changed this paper to R as it is a greenhouse experiment

  4. feurtado05: I cannot find this paper but there are 2 papers published by the same author and of the same topic. Should I scrape one of them?

DeirdreLoughnan commented 8 months ago

@soleil-nocturne123 Thanks for the reply, I know you have been busy with midterms!

  1. downie98 & edwards96: It looks like Justin scraped them. By not great, do you mean the curvature we get because of the way the page is scanned? I think the figures for Downie look ok to scrape, but agree with Justin's decision to just get data from the table for Edwards. Great work!
  2. elisafenko15: @soleil-nocturne123 @buniwuuu yes I would scrape it. It is wild that the figure has no axis labels, but I think it is fair to assume that the y is percent germination and the x is day. I would make note of this and any other assumptions you make when you scrape it.
  3. fetouh14: it is ok if the experiment was done in a greenhouse, as long as they were manipulating some sort of environmental factor. What were the treatments?
  4. feurtado05: Have you tried doing an ILL for it? We want to make sure our methods are standardized, so we don't need to scrape papers that were not selected based on our search criteria.
soleil-nocturne123 commented 8 months ago

@DeirdreLoughnan

fetouh14: The treatment is cold strat. What made me rejected this paper is the seed cultivation section: "Every 30 days seeds were removed from the cooler and sowed in polyethylene bags 10 x 12 cm filled with a light compost of two parts peat, one part loam and one part sand. Each treatment contained seven replicates of 45 bags each were arranged in a shallow tunnel covered with polyethylene until germination ceased. The experiment was conducted under saran greenhouse." Do you think I should still scrape it?

feurtado05: I did, but there were no papers coming back. I suspect the paper might actually be sent/published sometime between 2004 - 2005.

ngoj1 commented 8 months ago

Sadly Weerakoon81 is another one of those that just has the abstract. Tanuja20 is also unavailable, Woodward says they couldn't find a single library in their network that carried it. My source tab has been updated to match this.

Therefore, still waiting on:

Currently have pdf for:

DeirdreLoughnan commented 8 months ago

Thanks @ngoj1 for the update.

It looks like have a slight issue with the copying of datasetID for aldridge1992 again. No rush, but could you correct this the next time you work on egret.

DeirdreLoughnan commented 7 months ago

Hi @ngoj1 @soleil-nocturne123 @buniwuuu @kim-dajeong

I wanted to check the status of everyone's last few papers. Is anyone still working on scraping their papers or have all recent papers been finished and pushed to the repo?

If not, could you let me know the status or try to push what you can by Sunday night (Nov 26)?

@ngoj1 did you hear about the last three ILL you did?

ngoj1 commented 7 months ago

I just got Morozowska02 this morning, so the ones that we still need to scrape are:

I can take care of the first two, if anyone would like to do Morozowska02? The other ones that I requested ILLs for either didn't have any suppliers or were not in English, and all that information is in the source sheet that I just pushed.

buniwuuu commented 7 months ago

I finished the last two from Sylvia!

ngoj1 commented 7 months ago

Hello, I may be going crazy and/or blind but does anyone see where the numbers I to IV are explained in the Tashev08 paper? I've attached the confusing plot in question and also the paper in this comment. In the meantime I'll work on Morozowska02 since it's pretty short.

tashev08.pdf

image

soleil-nocturne123 commented 7 months ago

I have just updated the final paper!

soleil-nocturne123 commented 7 months ago

Hello, I may be going crazy and/or blind but does anyone see where the numbers I to IV are explained in the Tashev08 paper? I've attached the confusing plot in question and also the paper in this comment. In the meantime I'll work on Morozowska02 since it's pretty short.

tashev08.pdf

image

Yah, I couldn't find the description for those treatments either.

DeirdreLoughnan commented 7 months ago

Thanks everyone for confirming that things have been updated!

And thanks @ngoj1 for working on these last couple of papers.

I agree that the data in Tashev08 is unclear. My best guess is that the table is showing each replicate for the A and B variant (methods mention 5 x 50, so perhaps this is the five). Maybe hold off on scraping it for tonight and I will bring it up in the meeting with Lizzie tomorrow.

DeirdreLoughnan commented 7 months ago

@ngoj1 We discussed the Tashev08 paper and agree that there is no indication what the numbered columns mean, so we can't use the data and there is no need to scrape it.