graphnet-team / graphnet

A Deep learning library for neutrino telescopes
https://graphnet-team.github.io/graphnet/
Apache License 2.0
90 stars 92 forks source link

Issue within the parquet dataset class #331

Closed Aske-Rosted closed 1 year ago

Aske-Rosted commented 1 year ago

The bug happens when trying to read or write a parquet file. It would seem that in parquet_dataset._query_table, it expects an event number which is then to be turned into a sequential index by referring to the index. However the index variable when I run it seems to already be a sequential index (not event number) the below change seems to be working for me, but I do not know about the possible knock on effect, I imagine that this function is called several places.

def _query_table( self, table: str, columns: Union[List[str], str], index: int, selection: Optional[str] = None, ) -> List[Tuple[Any]]:

Check(s)

    assert (
        selection is None
    ), "Argument `selection` is currently not supported"

    sequential_index = index # replacing sequential_index =self._indices.index(index)   
     !!!!!!!!! ############## FIX WORKING FOR ME RIGHT HERE ################ !!!!!!!!!!!!!

    try:
        ak_array = self._parquet_hook[table][columns][sequential_index]
    except ValueError as e:
        if "does not exist (not in record)" in str(e):
            raise ColumnMissingException(str(e))
        else:
            raise e

    dictionary = ak_array.to_list()
    assert list(dictionary.keys()) == columns

    if all(map(np.isscalar, dictionary.values())):
        result = [tuple(dictionary.values())]

    else:
        # All arrays should have same length
        array_lengths = [
            len(values)
            for values in dictionary.values()
            if not np.isscalar(values)
        ]
        assert (
            len(set(array_lengths)) == 1
        ), f"Arrays in {dictionary} have differing lengths"
        nb_elements = array_lengths[0]

        # Broadcast scalars
        for key in dictionary:
            value = dictionary[key]
            if np.isscalar(value):
                dictionary[key] = np.repeat(
                    value, repeats=nb_elements
                ).tolist()

        result = list(map(tuple, list(zip(*dictionary.values()))))

    return result

Expected behavior sequential_index = self._indices[index] was expected to take an index (possible by event_number) and turn it into a sequential index number

Actual behavior recieves what I deem to be a sequential index number and returns error $index number$ not in index.

asogaard commented 1 year ago

What is the error message/log you receive?

Aske-Rosted commented 1 year ago

Running the read dataset example but using my own parquet dataset.

graphnet: INFO 2022-10-28 16:40:37 - get_logger - Writing log to logs/graphnet_20221028-164037.log graphnet: INFO 2022-10-28 16:40:37 - main - Available columns in SRTInIcePulses graphnet: INFO 2022-10-28 16:40:37 - main - . charge graphnet: INFO 2022-10-28 16:40:37 - main - . flags graphnet: INFO 2022-10-28 16:40:37 - main - . time graphnet: INFO 2022-10-28 16:40:37 - main - . width graphnet: INFO 2022-10-28 16:40:37 - main - . area graphnet: INFO 2022-10-28 16:40:37 - main - . directionazimuth graphnet: INFO 2022-10-28 16:40:37 - main - . directionphi graphnet: INFO 2022-10-28 16:40:37 - main - . directiontheta graphnet: INFO 2022-10-28 16:40:37 - main - . directionx graphnet: INFO 2022-10-28 16:40:37 - main - . directiony graphnet: INFO 2022-10-28 16:40:37 - main - . directionz graphnet: INFO 2022-10-28 16:40:37 - main - . directionzenith graphnet: INFO 2022-10-28 16:40:37 - main - . positionmag2 graphnet: INFO 2022-10-28 16:40:37 - main - . positionmagnitude graphnet: INFO 2022-10-28 16:40:37 - main - . positionphi graphnet: INFO 2022-10-28 16:40:37 - main - . positionr graphnet: INFO 2022-10-28 16:40:37 - main - . positionrho graphnet: INFO 2022-10-28 16:40:37 - main - . positiontheta graphnet: INFO 2022-10-28 16:40:37 - main - . positionx graphnet: INFO 2022-10-28 16:40:37 - main - . positiony graphnet: INFO 2022-10-28 16:40:37 - main - . positionz graphnet: INFO 2022-10-28 16:40:37 - main - . position_list graphnet: INFO 2022-10-28 16:40:37 - main - . atwd_beacon_baseline__parent graphnet: INFO 2022-10-28 16:40:37 - main - . atwd_bin_calib_slopeparent graphnet: INFO 2022-10-28 16:40:37 - main - . atwd_delta_tparent graphnet: INFO 2022-10-28 16:40:37 - main - . atwd_freq_fitparent graphnet: INFO 2022-10-28 16:40:37 - main - . atwd_gainparent graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributioncompensation_factor graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributionexp1_amp graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributionexp1_width graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributionexp2_amp graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributionexp2_width graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributiongaus_amp graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributiongaus_mean graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributiongaus_width graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributionis_valid graphnet: INFO 2022-10-28 16:40:37 - main - . combined_spe_charge_distributionslc_gaus_mean graphnet: INFO 2022-10-28 16:40:37 - main - . dom_cal_version graphnet: INFO 2022-10-28 16:40:37 - main - . dom_noise_decay_rate graphnet: INFO 2022-10-28 16:40:37 - main - . dom_noise_rate graphnet: INFO 2022-10-28 16:40:37 - main - . dom_noise_scintillation_hits graphnet: INFO 2022-10-28 16:40:37 - main - . dom_noise_scintillation_mean graphnet: INFO 2022-10-28 16:40:37 - main - . dom_noise_scintillation_sigma graphnet: INFO 2022-10-28 16:40:37 - main - . dom_noise_thermal_rate graphnet: INFO 2022-10-28 16:40:37 - main - . fadc_baseline_fitintercept graphnet: INFO 2022-10-28 16:40:37 - main - . fadc_baseline_fitslope graphnet: INFO 2022-10-28 16:40:37 - main - . fadc_beacon_baseline graphnet: INFO 2022-10-28 16:40:37 - main - . fadc_delta_t graphnet: INFO 2022-10-28 16:40:37 - main - . fadc_gain graphnet: INFO 2022-10-28 16:40:37 - main - . front_end_impedance graphnet: INFO 2022-10-28 16:40:37 - main - . hv_gain_fitintercept graphnet: INFO 2022-10-28 16:40:37 - main - . hv_gain_fitslope graphnet: INFO 2022-10-28 16:40:37 - main - . is_mean_atwd_charge_valid graphnet: INFO 2022-10-28 16:40:37 - main - . is_mean_fadc_charge_valid graphnet: INFO 2022-10-28 16:40:37 - main - . mean_atwd_charge graphnet: INFO 2022-10-28 16:40:37 - main - . mean_fadc_charge graphnet: INFO 2022-10-28 16:40:37 - main - . mpe_disc_calibintercept graphnet: INFO 2022-10-28 16:40:37 - main - . mpe_disc_calibslope graphnet: INFO 2022-10-28 16:40:37 - main - . pmt_disc_calibintercept graphnet: INFO 2022-10-28 16:40:37 - main - . pmt_disc_calibslope graphnet: INFO 2022-10-28 16:40:37 - main - . relative_dom_eff graphnet: INFO 2022-10-28 16:40:37 - main - . spe_disc_calibintercept graphnet: INFO 2022-10-28 16:40:37 - main - . spe_disc_calibslope graphnet: INFO 2022-10-28 16:40:37 - main - . tau_parametersp0 graphnet: INFO 2022-10-28 16:40:37 - main - . tau_parametersp1 graphnet: INFO 2022-10-28 16:40:37 - main - . tau_parameters__p2 graphnet: INFO 2022-10-28 16:40:37 - main - . tau_parametersp3 graphnet: INFO 2022-10-28 16:40:37 - main - . tau_parametersp4 graphnet: INFO 2022-10-28 16:40:37 - main - . tau_parameters__p5 graphnet: INFO 2022-10-28 16:40:37 - main - . tau_parameterstau_frac graphnet: INFO 2022-10-28 16:40:37 - main - . temperature graphnet: INFO 2022-10-28 16:40:37 - main - . transit_timeintercept graphnet: INFO 2022-10-28 16:40:37 - main - . transittimeslope graphnet: INFO 2022-10-28 16:40:37 - main - . indexom graphnet: INFO 2022-10-28 16:40:37 - main - . indexpmt graphnet: INFO 2022-10-28 16:40:37 - main - . indexstring graphnet: INFO 2022-10-28 16:40:37 - main - . indexlist graphnet: INFO 2022-10-28 16:40:37 - main - . event_no graphnet: INFO 2022-10-28 16:40:37 - main - Available columns in truth graphnet: INFO 2022-10-28 16:40:37 - main - . energy graphnet: INFO 2022-10-28 16:40:37 - main - . position_x graphnet: INFO 2022-10-28 16:40:37 - main - . position_y graphnet: INFO 2022-10-28 16:40:37 - main - . position_z graphnet: INFO 2022-10-28 16:40:37 - main - . azimuth graphnet: INFO 2022-10-28 16:40:37 - main - . zenith graphnet: INFO 2022-10-28 16:40:37 - main - . pid graphnet: INFO 2022-10-28 16:40:37 - main - . event_time graphnet: INFO 2022-10-28 16:40:37 - main - . sim_type graphnet: INFO 2022-10-28 16:40:37 - main - . interaction_type graphnet: INFO 2022-10-28 16:40:37 - main - . elasticity graphnet: INFO 2022-10-28 16:40:37 - main - . RunID graphnet: INFO 2022-10-28 16:40:37 - main - . SubrunID graphnet: INFO 2022-10-28 16:40:37 - main - . EventID graphnet: INFO 2022-10-28 16:40:37 - main - . SubEventID graphnet: INFO 2022-10-28 16:40:37 - main - . dbang_decay_length graphnet: INFO 2022-10-28 16:40:37 - main - . track_length graphnet: INFO 2022-10-28 16:40:37 - main - . stopped_muon graphnet: INFO 2022-10-28 16:40:37 - main - . energy_track graphnet: INFO 2022-10-28 16:40:37 - main - . inelasticity graphnet: INFO 2022-10-28 16:40:37 - main - . DeepCoreFilter_13 graphnet: INFO 2022-10-28 16:40:37 - main - . CascadeFilter_13 graphnet: INFO 2022-10-28 16:40:37 - main - . MuonFilter_13 graphnet: INFO 2022-10-28 16:40:37 - main - . OnlineL2Filter_17 graphnet: INFO 2022-10-28 16:40:37 - main - . L3_oscNext_bool graphnet: INFO 2022-10-28 16:40:37 - main - . L4_oscNext_bool graphnet: INFO 2022-10-28 16:40:37 - main - . L5_oscNext_bool graphnet: INFO 2022-10-28 16:40:37 - main - . L6_oscNext_bool graphnet: INFO 2022-10-28 16:40:37 - main - . L7_oscNext_bool graphnet: INFO 2022-10-28 16:40:37 - main - . event_no Traceback (most recent call last): File "/disk20/users/aske/graphnet/personal_scripts/read_dataset.py", line 110, in main(backend,path_to_file) File "/disk20/users/aske/graphnet/personal_scripts/read_dataset.py", line 79, in main truth_table=truth_table, File "/misc/disk20/users/aske/graphnet/src/graphnet/data/dataset.py", line 106, in init self._remove_missing_columns() File "/misc/disk20/users/aske/graphnet/src/graphnet/data/dataset.py", line 182, in _remove_missing_columns missing = self._check_missing_columns(self._features, pulsemap) File "/misc/disk20/users/aske/graphnet/src/graphnet/data/dataset.py", line 220, in _check_missing_columns self._query_table(table, [column], 0) File "/misc/disk20/users/aske/graphnet/src/graphnet/data/parquet/parquet_dataset.py", line 54, in _query_table self._indices.index(index) #sequential_index = index ValueError: 0 is not in list

asogaard commented 1 year ago

Alright, I see your point. Looks like the Parquet-data reading is going about this backwards. We should probably do something like https://github.com/graphnet-team/graphnet/blob/main/src/graphnet/data/sqlite/sqlite_dataset.py#L55 instead

Aske-Rosted commented 1 year ago

Did try something similar but that gives issues down the line beceause then you have index be equal to the actual indexing (what I think is the event number), leading to the following error.

graphnet: INFO 2022-10-28 16:50:34 - get_logger - Writing log to logs/graphnet_20221028-165034.log graphnet: INFO 2022-10-28 16:50:35 - main - Available columns in SRTInIcePulses graphnet: INFO 2022-10-28 16:50:35 - main - . charge graphnet: INFO 2022-10-28 16:50:35 - main - . flags graphnet: INFO 2022-10-28 16:50:35 - main - . time graphnet: INFO 2022-10-28 16:50:35 - main - . width graphnet: INFO 2022-10-28 16:50:35 - main - . area graphnet: INFO 2022-10-28 16:50:35 - main - . directionazimuth graphnet: INFO 2022-10-28 16:50:35 - main - . directionphi graphnet: INFO 2022-10-28 16:50:35 - main - . directiontheta graphnet: INFO 2022-10-28 16:50:35 - main - . directionx graphnet: INFO 2022-10-28 16:50:35 - main - . directiony graphnet: INFO 2022-10-28 16:50:35 - main - . directionz graphnet: INFO 2022-10-28 16:50:35 - main - . directionzenith graphnet: INFO 2022-10-28 16:50:35 - main - . positionmag2 graphnet: INFO 2022-10-28 16:50:35 - main - . positionmagnitude graphnet: INFO 2022-10-28 16:50:35 - main - . positionphi graphnet: INFO 2022-10-28 16:50:35 - main - . positionr graphnet: INFO 2022-10-28 16:50:35 - main - . positionrho graphnet: INFO 2022-10-28 16:50:35 - main - . positiontheta graphnet: INFO 2022-10-28 16:50:35 - main - . positionx graphnet: INFO 2022-10-28 16:50:35 - main - . positiony graphnet: INFO 2022-10-28 16:50:35 - main - . positionz graphnet: INFO 2022-10-28 16:50:35 - main - . position_list graphnet: INFO 2022-10-28 16:50:35 - main - . atwd_beacon_baseline__parent graphnet: INFO 2022-10-28 16:50:35 - main - . atwd_bin_calib_slopeparent graphnet: INFO 2022-10-28 16:50:35 - main - . atwd_delta_tparent graphnet: INFO 2022-10-28 16:50:35 - main - . atwd_freq_fitparent graphnet: INFO 2022-10-28 16:50:35 - main - . atwd_gainparent graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributioncompensation_factor graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributionexp1_amp graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributionexp1_width graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributionexp2_amp graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributionexp2_width graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributiongaus_amp graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributiongaus_mean graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributiongaus_width graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributionis_valid graphnet: INFO 2022-10-28 16:50:35 - main - . combined_spe_charge_distributionslc_gaus_mean graphnet: INFO 2022-10-28 16:50:35 - main - . dom_cal_version graphnet: INFO 2022-10-28 16:50:35 - main - . dom_noise_decay_rate graphnet: INFO 2022-10-28 16:50:35 - main - . dom_noise_rate graphnet: INFO 2022-10-28 16:50:35 - main - . dom_noise_scintillation_hits graphnet: INFO 2022-10-28 16:50:35 - main - . dom_noise_scintillation_mean graphnet: INFO 2022-10-28 16:50:35 - main - . dom_noise_scintillation_sigma graphnet: INFO 2022-10-28 16:50:35 - main - . dom_noise_thermal_rate graphnet: INFO 2022-10-28 16:50:35 - main - . fadc_baseline_fitintercept graphnet: INFO 2022-10-28 16:50:35 - main - . fadc_baseline_fitslope graphnet: INFO 2022-10-28 16:50:35 - main - . fadc_beacon_baseline graphnet: INFO 2022-10-28 16:50:35 - main - . fadc_delta_t graphnet: INFO 2022-10-28 16:50:35 - main - . fadc_gain graphnet: INFO 2022-10-28 16:50:35 - main - . front_end_impedance graphnet: INFO 2022-10-28 16:50:35 - main - . hv_gain_fitintercept graphnet: INFO 2022-10-28 16:50:35 - main - . hv_gain_fitslope graphnet: INFO 2022-10-28 16:50:35 - main - . is_mean_atwd_charge_valid graphnet: INFO 2022-10-28 16:50:35 - main - . is_mean_fadc_charge_valid graphnet: INFO 2022-10-28 16:50:35 - main - . mean_atwd_charge graphnet: INFO 2022-10-28 16:50:35 - main - . mean_fadc_charge graphnet: INFO 2022-10-28 16:50:35 - main - . mpe_disc_calibintercept graphnet: INFO 2022-10-28 16:50:35 - main - . mpe_disc_calibslope graphnet: INFO 2022-10-28 16:50:35 - main - . pmt_disc_calibintercept graphnet: INFO 2022-10-28 16:50:35 - main - . pmt_disc_calibslope graphnet: INFO 2022-10-28 16:50:35 - main - . relative_dom_eff graphnet: INFO 2022-10-28 16:50:35 - main - . spe_disc_calibintercept graphnet: INFO 2022-10-28 16:50:35 - main - . spe_disc_calibslope graphnet: INFO 2022-10-28 16:50:35 - main - . tau_parametersp0 graphnet: INFO 2022-10-28 16:50:35 - main - . tau_parametersp1 graphnet: INFO 2022-10-28 16:50:35 - main - . tau_parameters__p2 graphnet: INFO 2022-10-28 16:50:35 - main - . tau_parametersp3 graphnet: INFO 2022-10-28 16:50:35 - main - . tau_parametersp4 graphnet: INFO 2022-10-28 16:50:35 - main - . tau_parameters__p5 graphnet: INFO 2022-10-28 16:50:35 - main - . tau_parameterstau_frac graphnet: INFO 2022-10-28 16:50:35 - main - . temperature graphnet: INFO 2022-10-28 16:50:35 - main - . transit_timeintercept graphnet: INFO 2022-10-28 16:50:35 - main - . transittimeslope graphnet: INFO 2022-10-28 16:50:35 - main - . indexom graphnet: INFO 2022-10-28 16:50:35 - main - . indexpmt graphnet: INFO 2022-10-28 16:50:35 - main - . indexstring graphnet: INFO 2022-10-28 16:50:35 - main - . indexlist graphnet: INFO 2022-10-28 16:50:35 - main - . event_no graphnet: INFO 2022-10-28 16:50:35 - main - Available columns in truth graphnet: INFO 2022-10-28 16:50:35 - main - . energy graphnet: INFO 2022-10-28 16:50:35 - main - . position_x graphnet: INFO 2022-10-28 16:50:35 - main - . position_y graphnet: INFO 2022-10-28 16:50:35 - main - . position_z graphnet: INFO 2022-10-28 16:50:35 - main - . azimuth graphnet: INFO 2022-10-28 16:50:35 - main - . zenith graphnet: INFO 2022-10-28 16:50:35 - main - . pid graphnet: INFO 2022-10-28 16:50:35 - main - . event_time graphnet: INFO 2022-10-28 16:50:35 - main - . sim_type graphnet: INFO 2022-10-28 16:50:35 - main - . interaction_type graphnet: INFO 2022-10-28 16:50:35 - main - . elasticity graphnet: INFO 2022-10-28 16:50:35 - main - . RunID graphnet: INFO 2022-10-28 16:50:35 - main - . SubrunID graphnet: INFO 2022-10-28 16:50:35 - main - . EventID graphnet: INFO 2022-10-28 16:50:35 - main - . SubEventID graphnet: INFO 2022-10-28 16:50:35 - main - . dbang_decay_length graphnet: INFO 2022-10-28 16:50:35 - main - . track_length graphnet: INFO 2022-10-28 16:50:35 - main - . stopped_muon graphnet: INFO 2022-10-28 16:50:35 - main - . energy_track graphnet: INFO 2022-10-28 16:50:35 - main - . inelasticity graphnet: INFO 2022-10-28 16:50:35 - main - . DeepCoreFilter_13 graphnet: INFO 2022-10-28 16:50:35 - main - . CascadeFilter_13 graphnet: INFO 2022-10-28 16:50:35 - main - . MuonFilter_13 graphnet: INFO 2022-10-28 16:50:35 - main - . OnlineL2Filter_17 graphnet: INFO 2022-10-28 16:50:35 - main - . L3_oscNext_bool graphnet: INFO 2022-10-28 16:50:35 - main - . L4_oscNext_bool graphnet: INFO 2022-10-28 16:50:35 - main - . L5_oscNext_bool graphnet: INFO 2022-10-28 16:50:35 - main - . L6_oscNext_bool graphnet: INFO 2022-10-28 16:50:35 - main - . L7_oscNext_bool graphnet: INFO 2022-10-28 16:50:35 - main - . event_no graphnet: WARNING 2022-10-28 16:50:36 - ParquetDataset._remove_missing_columns - Removing the following (missing) truth variables: interaction_time 0%| | 0/1 [00:06<?, ? batches/s] Traceback (most recent call last): File "/cvmfs/icecube.opensciencegrid.org/py3-v4.1.0/RHEL_7_x86_64/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/cvmfs/icecube.opensciencegrid.org/py3-v4.1.0/RHEL_7_x86_64/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/misc/home/aske/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in cli.main() File "/misc/home/aske/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/misc/home/aske/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="main") File "/misc/home/aske/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path pkg_name=pkg_name, script_name=fname) File "/misc/home/aske/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/misc/home/aske/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "/disk20/users/aske/graphnet/personal_scripts/read_dataset.py", line 110, in main(backend,path_to_file) File "/disk20/users/aske/graphnet/personal_scripts/read_dataset.py", line 98, in main for batch in tqdm(dataloader, unit=" batches", colour="green"): File "/home/aske/IG/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/aske/IG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/aske/IG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/home/aske/IG/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/home/aske/IG/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/aske/IG/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/aske/IG/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/aske/IG/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/misc/disk20/users/aske/graphnet/src/graphnet/data/dataset.py", line 169, in getitem features, truth, node_truth, loss_weight = self._query(index) File "/misc/disk20/users/aske/graphnet/src/graphnet/data/dataset.py", line 247, in _query pulsemap, self._features, index, self._selection File "/misc/disk20/users/aske/graphnet/src/graphnet/data/parquet/parquet_dataset.py", line 62, in _query_table raise e File "/misc/disk20/users/aske/graphnet/src/graphnet/data/parquet/parquet_dataset.py", line 57, in _query_table ak_array = self._parquet_hook[table][columns][sequential_index] File "/home/aske/IG/lib/python3.7/site-packages/awkward/highlevel.py", line 991, in getitem tmp = ak._util.wrap(self.layout[where], self._behavior) ValueError: in RecordArray attempting to get 286, index out of range

(https://github.com/scikit-hep/awkward-1.0/blob/1.10.1/src/libawkward/array/RecordArray.cpp#L792)

Aske-Rosted commented 1 year ago

from what I can see we are asking for a number which should be in between [0, n_events].

asogaard commented 1 year ago

from what I can see we are asking for a number which should be in between [0, n_events].

Yes, if there is no selection applied. So we probably need self._indices to be in [0, n_events[, but may not be "dense," rather than be a list of event_nos.

asogaard commented 1 year ago

I checked that I was able to reproduce your error by setting selection to something non-sequential, like selection=[1,2,4,8,...], and the PR in #332 removes the resulting error.