Request for More Information on Hardcoded References in Preprocessing

Hello,

I have been exploring your NeuralCVD repository for our study and appreciate the considerable effort put into this tool. We believe it has a potential to make a significant contribution to our research. However, I have been encountering some difficulties during the preprocessing step.

The tool appears to have hardcoded references to files under:

path = "/data/analysis/ag-reils/steinfej/code/umbrella/pre/ukbb"
data_path = "/data/analysis/ag-reils/ag-reils-shared/cardioRS/data"

in the subfolder named mapping, also:

codes_gp_records = pd.read_feather(f"{data_path}/1_decoded/codes_gp_diagnoses_210119.feather").drop("level", axis=1)
codes_hospital_records = pd.read_feather(f"{data_path}/1_decoded/codes_hes_diagnoses_210120.feather")

which didn't include in the output of "0_decode_ukbb.ipynb".

While I understand that the UK Biobank codings are used in your tool, and I'm able to obtain those, there are other datasets which are not clear to me: atc, phecodes, snomed_cor_list, and athena_vocabulary_covid. I am having difficulty confirming the consistency of these data and their format with what the tool requires. In order to correctly run the tool and ensure the validity of our results, it's crucial that we have the same version and format of these specific datasets. Unfortunately, the current resources do not provide sufficient details to accurately reproduce this setup.

As a result, I kindly request you to share these referenced data directly, if it's possible and within compliance.

However, if direct access is not feasible due to any constraints, could you please provide further information on how to obtain or generate these datasets? This ideally includes the specific versions of these datasets, the expected formats, and any preprocessing steps required for compatibility with NeuralCVD.

Your assistance will greatly aid us in overcoming this roadblock, and will facilitate the effective use of this tool in our research.

Thank you for your time and for your invaluable contributions to the field.

Best regards, Shaun

thbuerg / NeuralCVD

Request for More Information on Hardcoded References in Preprocessing #1