weecology / retriever

Quickly download, clean up, and install public datasets into a database management system
http://data-retriever.org
Other
306 stars 134 forks source link

Add plant growth form dataset #639

Open ethanwhite opened 8 years ago

ethanwhite commented 8 years ago

http://onlinelibrary.wiley.com/doi/10.1002/ecy.1569/full

henrykironde commented 8 years ago

@ethanwhite are you working on this now? Just wondering because I was working on some data and i could look at this after?

ethanwhite commented 8 years ago

No, I'm not working on it. Just saw it had come out and thought it would be useful to add. No rush on this.

henrykironde commented 8 years ago

Ok, thanks. I was baffled by the date 27 August 2016, Cos I was pretty sure that we do not have data papers for 2016 yet. based on Ecological Archives wondering why this is not yet in the database here.

ethanwhite commented 8 years ago

When the journal Ecology transitioned to being published by Wiley they stopped adding new data papers to Ecological Archives and started posting them as supplemental material on Wiley's site. So, the data for this paper is located at:

http://onlinelibrary.wiley.com/store/10.1002/ecy.1569/asset/supinfo/ecy1569-sup-0001-DataS1.zip?v=1&s=219c74fa85bfe6738e649bedd844f40428351802

I have no idea how stable this url is and in general I think this move to supplementary material is a terrible idea. I've asked about it with higher ups at ESA, but after initially telling me they were looking into it they've gone quite. We should experiment with this url over the next couple of weeks and see if it's stable so we know if we can support these or in their current state or not.

henrykironde commented 8 years ago

okey thanks.

henrykironde commented 8 years ago

This has issues with data, similar to The thermal dependence of biological traits by Dell. I will look at it and enhance the script later

ethanwhite commented 8 years ago

Sounds good. No rush on this one at all.

shreyneil commented 6 years ago

Was this solved? and if not how to go about it @ethanwhite sir?

henrykironde commented 6 years ago

Not yet. We had issues with this data back then. I had mad some script e9743b5, I am not sure if it would work now.

But you could try and test out a new json script

ethanwhite commented 6 years ago

@shreyneil - definitely give it a go if you'd like. Thanks!

henrykironde commented 6 years ago

@shreyneil,

{
  "citation": "Engemann, K., Sandel, B., Boyle, B. L., Enquist, B. J., Jorgensen, P. M., Kattge, J., McGill, B. J., Morueta-Holme, N., Peet, R. K., Spencer, N. J., Violle, C., Wiser, S. K. and Svenning, J. .-C. (2016), A plant growth form dataset for the New World. Ecology. Accepted Author Manuscript. doi:10.1002/ecy.1569",
  "description": "This dataset provides growth form classifications for 67,413 vascular plant species from North, Central, and South America.",
  "homepage": "http://onlinelibrary.wiley.com/doi/10.1002/ecy.1569/full",
  "keywords": [
    "Taxon",
    "Plants"
  ],
  "name": "plant_growth_form_dataset",
  "archived": "zip",
  "resources": [
    {
      "path": "DataS1/GrowthForm_Final.txt",
      "dialect": {
        "header_rows": 1
      },
      "name": "GrowthForm_Final",
      "schema": {
        "fields": [
          {
            "name": "FAMILY_STD",
            "type": "char"
          },
          {
            "name": "SPECIES_STD",
            "type": "char"
          },
          {
            "name": "GROWTHFORM_STD",
            "type": "char"
          },
          {
            "name": "GROWTHFORM_DIV",
            "type": "char"
          },
          {
            "name": "ID",
            "type": "char"
          },
          {
            "name": "CONSENSUS",
            "type": "double"
          },
          {
            "name": "SOURCES",
            "type": "double"
          }
        ]
      },
      "url": "http://onlinelibrary.wiley.com/store/10.1002/ecy.1569/asset/supinfo/ecy1569-sup-0001-DataS1.zip?v=1&s=219c74fa85bfe6738e649bedd844f40428351802"
    }
  ],
  "retriever": "True",
  "retriever_minimum_version": "2.1.dev",
  "title": "Globi_interaction_data",
  "version": "1.0.0"
}

The paths you have are wrong, here is a sample using one resource.

henrykironde commented 6 years ago

@shreyneil, Thank you for the contribution on this issue. Please close the PRs #1102 and #1108 and create a new PR. Below is the working code revised from the contribution you made on both of those PR. Go ahead and make a new PR, test the code and make sure it works. If it fails, report the error that you are facing with the code. Also let's use the word Review, for only PR's that the contributor has tested and believes that they reach the expected goal. In this way, it makes it easy for reviewers to test when the code is done or look at the logic when errors are reported.

Again thanks for working on this, it took us long to get this dataset in 👍.
note: The file Growth..Scheme was encoded with 16bit.

#retriever
from retriever.lib.templates import Script
from retriever.lib.models import Table, Cleanup, correct_invalid_value
from pkg_resources import parse_version
try:
    from retriever.lib.defaults import VERSION
    try:
      from retriever.lib.tools import open_fr, open_fw, to_str
    except ImportError:
      from retriever.lib.scripts import open_fr, open_fw
except ImportError:
    from retriever import open_fr, open_fw, VERSION

class main(Script):
    def __init__(self, **kwargs):
        Script.__init__(self, **kwargs)
        self.name = "plantf"
        self.ref = "http://onlinelibrary.wiley.com/doi/10.1002/ecy.1569/full"
        self.urls = {"plantgrowth": "http://onlinelibrary.wiley.com/store/10.1002/ecy.1569/asset/supinfo/ecy1569-sup-0001-DataS1.zip?v=1&s=219c74fa85bfe6738e649bedd844f40428351802"}
        self.citation = "cite----here"
        self.tags = ['tags]
        self.retriever_minimum_version = "2.0.dev"
        self.script_version = 1.0
        self.description = "Test."

    def download(self, engine=None, debug=False):
        Script.download(self, engine, debug)
        engine = self.engine
        # Download and unzip all files
        file_names = ["DataS1/Growthform_Scheme.txt", "DataS1/GrowthForm_Initial.txt", "DataS1/GrowthForm_Final.txt"]
        engine.download_files_from_archive(self.urls["plantgrowth"], file_names,
                                           filetype="zip")

        # process DataS1/Growthform_Sch.txt

        data_path = self.engine.format_filename("DataS1/Growthform_Sch.txt")
        old_data = open_fr(self.engine.find_file("DataS1/Growthform_Scheme.txt"), encoding="UTF-16")
        new_data = open_fw(data_path)
        for line in old_data:
            new_data.write(to_str(line).strip()+"\n")
        new_data.close()
        old_data.close()

        if parse_version(VERSION).__str__() >= parse_version("2.1.dev").__str__():
            self.engine.auto_create_table(Table('Growthform_Scheme', delimiter="\t"), filename="DataS1/Growthform_Sch.txt")
            self.engine.insert_data_from_file(engine.format_filename("DataS1/Growthform_Sch.txt"))
        else:
            self.engine.auto_create_table(Table('Growthform_Scheme', delimiter="\t"),
                                      filename="Growthform_Sch.txt")
            self.engine.insert_data_from_file(engine.format_filename("Growthform_Sch.txt"))

        # process DataS1/GrowthForm_Initial.txt
        if parse_version(VERSION).__str__() >= parse_version("2.1.dev").__str__():
            self.engine.auto_create_table(Table('GrowthForm_Initial', delimiter="\t"),
                                          filename="DataS1/GrowthForm_Initial.txt")
            self.engine.insert_data_from_file(engine.format_filename("DataS1/GrowthForm_Initial.txt"))
        else:
            self.engine.auto_create_table(Table('GrowthForm_Initial', delimiter="\t"),
                                          filename="GrowthForm_Initial.txt")
            self.engine.insert_data_from_file(engine.format_filename("GrowthForm_Initial.txt"))

        # process DataS1/GrowthForm_Final.txt
        if parse_version(VERSION).__str__() >= parse_version("2.1.dev").__str__():
            self.engine.auto_create_table(Table('GrowthForm_Final', delimiter="\t"),
                                          filename="DataS1/GrowthForm_Final.txt")
            self.engine.insert_data_from_file(engine.format_filename("DataS1/GrowthForm_Final.txt"))
        else:
            self.engine.auto_create_table(Table('GrowthForm_Final', delimiter="\t"),
                                          filename="GrowthForm_Final.txt")
            self.engine.insert_data_from_file(engine.format_filename("GrowthForm_Final.txt"))

SCRIPT = main()
henrykironde commented 6 years ago

Note: The pasted code needs cleaning up and ensure that the fields are correct, self.ref, self.url,..