`gwas_study_index` data for backend integration

Similar to the work on variant index (#3350), we would like to serve a gwas_study_index dataset through OS + API.

This dataset is created by different upstream ETL processes (gentropy), but they all write to the same location and contain a shared schema validated within the ETL. So effectively, they can be considered one single dataset that we will like to load:

❯ gsutil ls gs://genetics_etl_python_playground/releases/24.06/study_index/
gs://genetics_etl_python_playground/releases/24.06/study_index/eqtl_catalogue/
gs://genetics_etl_python_playground/releases/24.06/study_index/finngen/
gs://genetics_etl_python_playground/releases/24.06/study_index/gwas_catalog/

Some stats:

Parquet size: 114.5 MiB
Rows (studies): 1_971_058

Study type breakdown:

+---------+-------+
|studyType|  count|
+---------+-------+
|     sqtl| 214987| -> No traitFromSourceMappedIds or backgroundTraitFromSourceMappedIds
|     pqtl|    802| -> No traitFromSourceMappedIds or backgroundTraitFromSourceMappedIds
|    tuqtl| 364493| -> No traitFromSourceMappedIds or backgroundTraitFromSourceMappedIds
|     eqtl|1299310| -> No traitFromSourceMappedIds or backgroundTraitFromSourceMappedIds
|     gwas|  91466| -> Null geneId, Null biosampleFromSourceId, 13_112 Not null backgroundTraitFromSourceMappedIds
+---------+-------+

Resolvable entities

Some of these columns are nullable, as described in the table above:

traitFromSourceMappedIds -> diseases . traitFromSourceMappedIdshould be converted to a diseases array column containing a list of resolvable disease objects.
backgroundTraitFromSourceMappedIds -> backgroundTraits Exactly the same as the above.
geneId -> target. All Ensembl gene IDs. should be converted into resolvable target objects.

There is a special case for `biosampleFromSourceId. In the future, we might want to resolve this object, but it has some extra complexities that we would like to postpone to a later time.

The latest stable version of the study index aligned with the schema provided above can be found here: gs://genetics_etl_python_playground/releases/24.06/study_index/

Some of the sub-datasets do not present every column. I had to use the next option to read everything spark.read.option('mergeSchema', 'true').parquet(... ). In the future, this dataset might come from one single parquet instead of a directory of parquets with compatible schema.

opentargets / issues

`gwas_study_index` data for backend integration #3357

Resolvable entities