In GEO in samples (GSM) files can have same sample title. Geofetch creates sample_names based on that values. Consequently, few files can have same sample_name. In this case peppy will create just one sample for few files, that will have column (attribute) that contains list of few elements (files).
e.g. https://pephub.databio.org/pep/geo/GSE131026/view?tag=default
This geofetch feature can later add some complications in processing bed files for bedbase. The obstacles that we can face with:
1) some of the variables will be strings, and some of them will be lists, so in further steps we have to take into account.
2) One sample can have few attributes that contain lists: e.g. file and file_format. If we have lists we can't be sure about file format, as this two lists are not linked.
In my opinion, peps for processed files should focus on files, so each file will have unique sample_name.
In GEO in samples (GSM) files can have same sample title. Geofetch creates sample_names based on that values. Consequently, few files can have same
sample_name
. In this case peppy will create just one sample for few files, that will have column (attribute) that contains list of few elements (files). e.g. https://pephub.databio.org/pep/geo/GSE131026/view?tag=defaultThis geofetch feature can later add some complications in processing bed files for bedbase. The obstacles that we can face with: 1) some of the variables will be strings, and some of them will be lists, so in further steps we have to take into account. 2) One sample can have few attributes that contain lists: e.g. file and file_format. If we have lists we can't be sure about file format, as this two lists are not linked.
In my opinion, peps for processed files should focus on files, so each file will have unique
sample_name
.