opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Irregular structure in `Tractability` folder in `otar001-core` #3337

Open javfg opened 2 weeks ago

javfg commented 2 weeks ago

While working on streamlining PIS, particularly in the target step, I've come up with a discrepancy in the standard directory structure for data used by this step.

The data fetched from buckets by PIS for this step are: Essentiality, subcellularLocations, hpa, hallmarks, TEPs, ChemicalProbes, TargetSafety, Tractability.

PIS does select and download the file with a latest creation date in a path. In all cases, the files are all in a directory. ChemicalProbes is a good example.

However, for Tractability, the data is split into subfolders named as the releases.

Discussing this with @ireneisdoomed yesterday, she said this is due to the data being uploaded by a third party. Also, if there is no data in the current release subfolder, we should fall back to a previous one.

Given this structure is only happening in this folder, it would be good to assess if it can be flattened, removing those subfolders. This would save having to add custom logic into PIS for the retrieval of tractability data.

The files themselves already have the release number as part of their name, so no information would be lost. Besides, the manifest @mbdebian put in place in PIS is a great idea as a source of truth as to which files have gone into a particular release.

Let me know what you think!