NeuroCausal: data-storage

In this repository we will be listing the parametric files that is required for the analysis pipeline to run. We will also share the OSF link to the downloaded open access clinical papers that will be used in NeuroCausal pipeline.

The dataset will be periodically updated as the new bunch of data added and new features are extended to the tool.

The current data resides in NeuroCausal OSF Repository for a specific queries (aphasia + neurodegeneration and aphasia + fMRI ) we run with NeuroQuery pipeline.

The data folder structure of the shared data is:

query-aphasia_neurodegenerative.zip
  └── query-aphasia_neurodegenerative
      ├── articlesets
      │   ├── articleset_00000.xml
      │   └── info.json
      └── articles
          ├── 000
          │   └──pmcid_4382926.xml
          ├── 00a
          │   └──pmcid_8317687.xml
          ├── 00b
          │   └── pmcid_6625472.xml
          ├── ...
          ├── e12
          │   ├── pmcid_8832765.xml
          ├── ...
          │
          └── info.json
          └── subset_extractedData
                 ├── authors.csv
                 ├── coordinates.csv
                 ├── info.json
                 ├── metadata.csv
                 └── text.csv

articlesets Folder contains the raw xml file that contains the bulk of papers downloaded with the query terms. articles folder contains the xml file for the each papers downloaded. subset_allArticles_extractedData folder contains the data separated into metadata specific individual csv files.

metadata.csv contains one row per article, with some metadata: pmcid (PubMed Central ID), pmid (PubMed ID), doi, title, journal, publication_year and license. Note some values may be missing (for example not all articles have a pmid or doi).
authors.csv contains one row per article per author. Fields are pmcid, surname, given-names.
text.csv contains one row per article. The first field is the pmcid, and the other fields are title, keywords, abstract, and body, and contain the text extracted from these parts of the article.
coordinates.csv contains one row for each (x, y, z) stereotactic coordinate found in any article. Its fields are the pmcid of the article, the table label and id the coordinates came from, and x, y, z.

neurocausal / neurocausal_data

readme

NeuroCausal: data-storage