Closed kvegesan-stjude closed 4 months ago
Hi kvegesan-stjude, thanks for reaching out! I ran the Quickstart myself locally and did not experience this issue. I think you are using an older version of immuneML. We have recently made a lot of major updates, including a different internal format for storing the files and changes to the simulation instruction, both affecting how the Quickstart works internally. Could you update to the latest version (3.0.0a4) and let me know if you are still experiencing this issue?
Thank you for the quick response. I have installed immuneml to 3.0.0a4 in a fresh environment. The default version on pip and conda is 2.2.6
(immuneml_env) [kvegesan@noderome105 immuneML]$ conda list|grep immune
# packages in environment at /home/kvegesan/.conda/envs/immuneml_env:
immuneml 3.0.0a4 pypi_0 pypi
I still get the same error. This is the output of log.txt
2024-05-09 10:43:55,943 ERROR:
--- Exception in parse_dataset : ImportHelper: error when importing file c3178776d43f4b0d94983f8220fc7d3d.tsv: Bool column has NA values in column 2
ImportHelper: an error occurred during dataset import while parsing the input file: quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/repertoires/c3178776d43f4b0d94983f8220fc7d3d.tsv.
Please make sure this is a correct immune receptor data file (not metadata).
The parameters used for import are DatasetImportParams(path=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr'), is_repertoire=True, metadata_file=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/metadata.csv'), paired=False, receptor_chains=None, result_path=PosixPath('quickstart_results/machine_learning_analysis/result/datasets/d1'), columns_to_load=None, separator='\t', column_mapping={'junction': 'sequence', 'junction_aa': 'sequence_aa', 'locus': 'chain'}, column_mapping_synonyms=None, region_type=<RegionType.IMGT_CDR3: 'IMGT_CDR3'>, import_productive=True, import_unknown_productivity=True, import_unproductive=None, import_with_stop_codon=False, import_out_of_frame=False, import_illegal_characters=True, metadata_column_mapping=None, number_of_processes=1, sequence_file_size=50000, organism=None, import_empty_nt_sequences=True, import_empty_aa_sequences=False).
For technical description of the error, see the log above. For details on how to specify the dataset import, see the documentation.
An error occurred while parsing the dataset d1. See the log above for more details.
This is the full log
(immuneml_env) [kvegesan@noderome105 immuneML]$ immune-ml-quickstart ./quickstart_results/ > log.txt
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ImportHelper.py", line 151, in load_sequence_dataframe
df = alternative_load_func(filepath, params)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/IO/dataset_import/AIRRImport.py", line 155, in alternative_load_func
df = airr.load_rearrangement(filename)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/airr/interface.py", line 103, in load_rearrangement
df = pd.read_csv(filename, sep='\t', header=0, index_col=None,
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1036, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1075, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1220, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Bool column has NA values in column 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ImportHelper.py", line 130, in load_repertoire_as_object
dataframe = ImportHelper.load_sequence_dataframe(filename, params, alternative_load_func)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ImportHelper.py", line 155, in load_sequence_dataframe
raise Exception(
Exception: Bool column has NA values in column 2
ImportHelper: an error occurred during dataset import while parsing the input file: quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/repertoires/c3178776d43f4b0d94983f8220fc7d3d.tsv.
Please make sure this is a correct immune receptor data file (not metadata).
The parameters used for import are DatasetImportParams(path=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr'), is_repertoire=True, metadata_file=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/metadata.csv'), paired=False, receptor_chains=None, result_path=PosixPath('quickstart_results/machine_learning_analysis/result/datasets/d1'), columns_to_load=None, separator='\t', column_mapping={'junction': 'sequence', 'junction_aa': 'sequence_aa', 'locus': 'chain'}, column_mapping_synonyms=None, region_type=<RegionType.IMGT_CDR3: 'IMGT_CDR3'>, import_productive=True, import_unknown_productivity=True, import_unproductive=None, import_with_stop_codon=False, import_out_of_frame=False, import_illegal_characters=True, metadata_column_mapping=None, number_of_processes=1, sequence_file_size=50000, organism=None, import_empty_nt_sequences=True, import_empty_aa_sequences=False).
For technical description of the error, see the log above. For details on how to specify the dataset import, see the documentation.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ImportHelper.py", line 144, in load_repertoire_as_object
raise RuntimeError(
RuntimeError: ImportHelper: error when importing file c3178776d43f4b0d94983f8220fc7d3d.tsv: Bool column has NA values in column 2
ImportHelper: an error occurred during dataset import while parsing the input file: quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/repertoires/c3178776d43f4b0d94983f8220fc7d3d.tsv.
Please make sure this is a correct immune receptor data file (not metadata).
The parameters used for import are DatasetImportParams(path=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr'), is_repertoire=True, metadata_file=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/metadata.csv'), paired=False, receptor_chains=None, result_path=PosixPath('quickstart_results/machine_learning_analysis/result/datasets/d1'), columns_to_load=None, separator='\t', column_mapping={'junction': 'sequence', 'junction_aa': 'sequence_aa', 'locus': 'chain'}, column_mapping_synonyms=None, region_type=<RegionType.IMGT_CDR3: 'IMGT_CDR3'>, import_productive=True, import_unknown_productivity=True, import_unproductive=None, import_with_stop_codon=False, import_out_of_frame=False, import_illegal_characters=True, metadata_column_mapping=None, number_of_processes=1, sequence_file_size=50000, organism=None, import_empty_nt_sequences=True, import_empty_aa_sequences=False).
For technical description of the error, see the log above. For details on how to specify the dataset import, see the documentation.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/import_parsers/ImportParser.py", line 59, in parse_dataset
dataset = import_cls.import_dataset(params, key)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/IO/dataset_import/AIRRImport.py", line 105, in import_dataset
return ImportHelper.import_dataset(AIRRImport, params, dataset_name)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ImportHelper.py", line 51, in import_dataset
dataset = ImportHelper.import_repertoire_dataset(import_class, processed_params, dataset_name)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ImportHelper.py", line 98, in import_repertoire_dataset
repertoires = pool.starmap(ImportHelper.load_repertoire_as_object, arguments)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
RuntimeError: ImportHelper: error when importing file c3178776d43f4b0d94983f8220fc7d3d.tsv: Bool column has NA values in column 2
ImportHelper: an error occurred during dataset import while parsing the input file: quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/repertoires/c3178776d43f4b0d94983f8220fc7d3d.tsv.
Please make sure this is a correct immune receptor data file (not metadata).
The parameters used for import are DatasetImportParams(path=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr'), is_repertoire=True, metadata_file=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/metadata.csv'), paired=False, receptor_chains=None, result_path=PosixPath('quickstart_results/machine_learning_analysis/result/datasets/d1'), columns_to_load=None, separator='\t', column_mapping={'junction': 'sequence', 'junction_aa': 'sequence_aa', 'locus': 'chain'}, column_mapping_synonyms=None, region_type=<RegionType.IMGT_CDR3: 'IMGT_CDR3'>, import_productive=True, import_unknown_productivity=True, import_unproductive=None, import_with_stop_codon=False, import_out_of_frame=False, import_illegal_characters=True, metadata_column_mapping=None, number_of_processes=1, sequence_file_size=50000, organism=None, import_empty_nt_sequences=True, import_empty_aa_sequences=False).
For technical description of the error, see the log above. For details on how to specify the dataset import, see the documentation.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 10, in wrapped
return func(*args, **kwargs)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/import_parsers/ImportParser.py", line 68, in parse_dataset
raise Exception(f"{ex}\n\nAn error occurred while parsing the dataset {key}. See the log above for more details.")
Exception: ImportHelper: error when importing file c3178776d43f4b0d94983f8220fc7d3d.tsv: Bool column has NA values in column 2
ImportHelper: an error occurred during dataset import while parsing the input file: quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/repertoires/c3178776d43f4b0d94983f8220fc7d3d.tsv.
Please make sure this is a correct immune receptor data file (not metadata).
The parameters used for import are DatasetImportParams(path=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr'), is_repertoire=True, metadata_file=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/metadata.csv'), paired=False, receptor_chains=None, result_path=PosixPath('quickstart_results/machine_learning_analysis/result/datasets/d1'), columns_to_load=None, separator='\t', column_mapping={'junction': 'sequence', 'junction_aa': 'sequence_aa', 'locus': 'chain'}, column_mapping_synonyms=None, region_type=<RegionType.IMGT_CDR3: 'IMGT_CDR3'>, import_productive=True, import_unknown_productivity=True, import_unproductive=None, import_with_stop_codon=False, import_out_of_frame=False, import_illegal_characters=True, metadata_column_mapping=None, number_of_processes=1, sequence_file_size=50000, organism=None, import_empty_nt_sequences=True, import_empty_aa_sequences=False).
For technical description of the error, see the log above. For details on how to specify the dataset import, see the documentation.
An error occurred while parsing the dataset d1. See the log above for more details.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/bin/immune-ml-quickstart", line 8, in <module>
sys.exit(main())
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/workflows/instructions/quickstart.py", line 198, in main
quickstart.run(sys.argv[1] if len(sys.argv) == 2 else None)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/workflows/instructions/quickstart.py", line 191, in run
app.run()
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 45, in run
symbol_table, self._specification_path = ImmuneMLParser.parse_yaml_file(self._specification_path, self._result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 119, in parse_yaml_file
symbol_table, path = ImmuneMLParser.parse(workflow_specification, file_path, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 141, in parse
def_parser_output, specs_defs = DefinitionParser.parse(workflow_specification, symbol_table, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/definition_parsers/DefinitionParser.py", line 51, in parse
symbol_table, new_specs = DefinitionParser._call_if_exists(parser.keyword, parser.parse, specs,
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/definition_parsers/DefinitionParser.py", line 61, in _call_if_exists
return method(specs[key], symbol_table, path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/import_parsers/ImportParser.py", line 25, in parse
dataset = ImportParser.parse_dataset(key, workflow_specification[key], path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 14, in wrapped
raise Exception(f"{e}\n\n"
Exception: ImportHelper: error when importing file c3178776d43f4b0d94983f8220fc7d3d.tsv: Bool column has NA values in column 2
ImportHelper: an error occurred during dataset import while parsing the input file: quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/repertoires/c3178776d43f4b0d94983f8220fc7d3d.tsv.
Please make sure this is a correct immune receptor data file (not metadata).
The parameters used for import are DatasetImportParams(path=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr'), is_repertoire=True, metadata_file=PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/metadata.csv'), paired=False, receptor_chains=None, result_path=PosixPath('quickstart_results/machine_learning_analysis/result/datasets/d1'), columns_to_load=None, separator='\t', column_mapping={'junction': 'sequence', 'junction_aa': 'sequence_aa', 'locus': 'chain'}, column_mapping_synonyms=None, region_type=<RegionType.IMGT_CDR3: 'IMGT_CDR3'>, import_productive=True, import_unknown_productivity=True, import_unproductive=None, import_with_stop_codon=False, import_out_of_frame=False, import_illegal_characters=True, metadata_column_mapping=None, number_of_processes=1, sequence_file_size=50000, organism=None, import_empty_nt_sequences=True, import_empty_aa_sequences=False).
For technical description of the error, see the log above. For details on how to specify the dataset import, see the documentation.
An error occurred while parsing the dataset d1. See the log above for more details.
ImmuneMLParser: an error occurred during parsing in function parse_dataset with parameters: ('d1', {'format': 'AIRR', 'params': {'is_repertoire': True, 'path': PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr'), 'paired': False, 'import_productive': True, 'import_unknown_productivity': True, 'import_with_stop_codon': False, 'import_out_of_frame': False, 'import_illegal_characters': True, 'region_type': 'IMGT_CDR3', 'separator': '\t', 'column_mapping': {'junction': 'sequence', 'junction_aa': 'sequence_aa', 'locus': 'chain'}, 'import_empty_nt_sequences': True, 'import_empty_aa_sequences': False, 'metadata_file': PosixPath('quickstart_results/synthetic_dataset/result/simulation_instruction/exported_dataset/airr/metadata.csv'), 'result_path': PosixPath('quickstart_results/machine_learning_analysis/result/datasets/d1')}}, PosixPath('quickstart_results/machine_learning_analysis/result')).
For more details on how to write the specification, see the documentation. For technical description of the error, see the log above.
Thanks for the info! I'll have a bit more time to look into the details tomorrow. It's a little challenging to debug since I'm not experiencing the same issue on my side with this version, so it could help me a lot if you could share the following with me:
also, what operating system are you using? windows, linux, mac?
These are the packages I have in the environment. env.txt
This is the zipped file of the quickstart analysis. There is one log file, but I'm not sure if there are others. quickstart_results.zip
I'm on a RedHat Enterprise linux 8 environment. This is my organizations computing cluster.
Hi kvegesan-stjude, I haven't been able to reproduce the issue yet, but I suspect may be triggered by the fact that simulated sequences in repertoires have an unknown status for the fields "productive"/"vj_in_frame" when exported to AIRR format (resulting in a mix of "True" and "nan" values). I haven't pinpointed yet why this results in an error for you and not for me, but I'm almost certain it's due to some dependency version difference, and I will need to spend a bit more time next week to figure this out.
To help me confirm if this is the root of the issue, would you be able to run the following 3 tiny examples, and let me know which of them work or fail and with what errors: debugging_example.zip These example runs simply import and export a dataset consisting of 1 tiny repertoire each.
Could you try reinstalling the airr dependency? We are using the same version (1.3.1), but your traceback seems to indicate that your airr installation internally tries to call pandas. In my airr installation, it's not calling pandas but there are some commented out lines which do so. I wonder if there may be multiple airr packages installed simultaneously (try running "pip uninstall airr" several times).
Removing airr and reinstalling it did the trick. I was able to run the quickstart.
I also ran the 3 examples. Spec1 failed due to a type mismatch error:
2024-05-10 11:41:15.597605: Running immuneML version 3.0.0a4
2024-05-10 11:41:15.597962: Setting temporary cache path to spec1/cache
2024-05-10 11:41:15.598005: immuneML: parsing the specification...
2024-05-10 11:41:16.126492:
Imported repertoire dataset my_dataset:
Example count: 1
Labels: {'my_signal', 'sim_item', 'type_dict'}
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 10, in wrapped
return func(*args, **kwargs)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/InstructionParser.py", line 67, in parse_instruction
instruction_object = parser.parse(key, instruction, symbol_table, path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/instruction_parsers/DatasetExportParser.py", line 65, in parse
ParameterValidator.assert_type_and_value(instruction["number_of_processes"], int, location, "number_of_processes", 1)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ParameterValidator.py", line 42, in assert_type_and_value
assert isinstance(value, parameter_type), f"{base_mssg}It has to be of type {type_name}, but is now of type {type(value).__name__}."
AssertionError: DatasetExportParser: None is not a valid value for parameter number_of_processes. It has to be of type int, but is now of type NoneType.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/bin/immune-ml", line 8, in <module>
sys.exit(main())
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 90, in main
run_immuneML(namespace)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 75, in run_immuneML
app.run()
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 45, in run
symbol_table, self._specification_path = ImmuneMLParser.parse_yaml_file(self._specification_path, self._result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 119, in parse_yaml_file
symbol_table, path = ImmuneMLParser.parse(workflow_specification, file_path, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 142, in parse
symbol_table, specs_instructions = InstructionParser.parse(def_parser_output, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/InstructionParser.py", line 50, in parse
InstructionParser.parse_instruction(key, specification[InstructionParser.keyword][key], symbol_table, path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 14, in wrapped
raise Exception(f"{e}\n\n"
Exception: DatasetExportParser: None is not a valid value for parameter number_of_processes. It has to be of type int, but is now of type NoneType.
ImmuneMLParser: an error occurred during parsing in function parse_instruction with parameters: ('export_dataset', {'type': 'DatasetExport', 'datasets': ['my_dataset'], 'number_of_processes': None, 'export_formats': ['AIRR']}, SymbolTable(), PosixPath('spec1')).
For more details on how to write the specification, see the documentation. For technical description of the error, see the log above.
Spec2 also failed with the same error:
(immuneml_env) [kvegesan@noderome105 debugging_example]$ immune-ml spec2.yaml spec2/
2024-05-10 11:42:28.438035: Running immuneML version 3.0.0a4
2024-05-10 11:42:28.438386: Setting temporary cache path to spec2/cache
2024-05-10 11:42:28.438432: immuneML: parsing the specification...
2024-05-10 11:42:28.777952:
Imported repertoire dataset my_dataset:
Example count: 1
Labels: {'my_signal', 'sim_item', 'type_dict'}
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 10, in wrapped
return func(*args, **kwargs)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/InstructionParser.py", line 67, in parse_instruction
instruction_object = parser.parse(key, instruction, symbol_table, path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/instruction_parsers/DatasetExportParser.py", line 65, in parse
ParameterValidator.assert_type_and_value(instruction["number_of_processes"], int, location, "number_of_processes", 1)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ParameterValidator.py", line 42, in assert_type_and_value
assert isinstance(value, parameter_type), f"{base_mssg}It has to be of type {type_name}, but is now of type {type(value).__name__}."
AssertionError: DatasetExportParser: None is not a valid value for parameter number_of_processes. It has to be of type int, but is now of type NoneType.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/bin/immune-ml", line 8, in <module>
sys.exit(main())
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 90, in main
run_immuneML(namespace)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 75, in run_immuneML
app.run()
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 45, in run
symbol_table, self._specification_path = ImmuneMLParser.parse_yaml_file(self._specification_path, self._result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 119, in parse_yaml_file
symbol_table, path = ImmuneMLParser.parse(workflow_specification, file_path, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 142, in parse
symbol_table, specs_instructions = InstructionParser.parse(def_parser_output, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/InstructionParser.py", line 50, in parse
InstructionParser.parse_instruction(key, specification[InstructionParser.keyword][key], symbol_table, path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 14, in wrapped
raise Exception(f"{e}\n\n"
Exception: DatasetExportParser: None is not a valid value for parameter number_of_processes. It has to be of type int, but is now of type NoneType.
ImmuneMLParser: an error occurred during parsing in function parse_instruction with parameters: ('export_dataset', {'type': 'DatasetExport', 'datasets': ['my_dataset'], 'number_of_processes': None, 'export_formats': ['AIRR']}, SymbolTable(), PosixPath('spec2')).
For more details on how to write the specification, see the documentation. For technical description of the error, see the log above.
Spec3 also had the same error:
(immuneml_env) [kvegesan@noderome105 debugging_example]$ immune-ml spec3.yaml spec3/
2024-05-10 11:44:37.764905: Running immuneML version 3.0.0a4
2024-05-10 11:44:37.765271: Setting temporary cache path to spec3/cache
2024-05-10 11:44:37.765318: immuneML: parsing the specification...
2024-05-10 11:44:38.129818:
Imported repertoire dataset my_dataset:
Example count: 1
Labels: {'type_dict', 'my_signal', 'sim_item'}
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 10, in wrapped
return func(*args, **kwargs)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/InstructionParser.py", line 67, in parse_instruction
instruction_object = parser.parse(key, instruction, symbol_table, path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/instruction_parsers/DatasetExportParser.py", line 65, in parse
ParameterValidator.assert_type_and_value(instruction["number_of_processes"], int, location, "number_of_processes", 1)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/ParameterValidator.py", line 42, in assert_type_and_value
assert isinstance(value, parameter_type), f"{base_mssg}It has to be of type {type_name}, but is now of type {type(value).__name__}."
AssertionError: DatasetExportParser: None is not a valid value for parameter number_of_processes. It has to be of type int, but is now of type NoneType.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kvegesan/.conda/envs/immuneml_env/bin/immune-ml", line 8, in <module>
sys.exit(main())
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 90, in main
run_immuneML(namespace)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 75, in run_immuneML
app.run()
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/app/ImmuneMLApp.py", line 45, in run
symbol_table, self._specification_path = ImmuneMLParser.parse_yaml_file(self._specification_path, self._result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 119, in parse_yaml_file
symbol_table, path = ImmuneMLParser.parse(workflow_specification, file_path, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/ImmuneMLParser.py", line 142, in parse
symbol_table, specs_instructions = InstructionParser.parse(def_parser_output, result_path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/dsl/InstructionParser.py", line 50, in parse
InstructionParser.parse_instruction(key, specification[InstructionParser.keyword][key], symbol_table, path)
File "/home/kvegesan/.conda/envs/immuneml_env/lib/python3.8/site-packages/immuneML/util/Logger.py", line 14, in wrapped
raise Exception(f"{e}\n\n"
Exception: DatasetExportParser: None is not a valid value for parameter number_of_processes. It has to be of type int, but is now of type NoneType.
ImmuneMLParser: an error occurred during parsing in function parse_instruction with parameters: ('export_dataset', {'type': 'DatasetExport', 'datasets': ['my_dataset'], 'number_of_processes': None, 'export_formats': ['AIRR']}, SymbolTable(), PosixPath('spec3')).
For more details on how to write the specification, see the documentation. For technical description of the error, see the log above.
Good to hear that reinstalling the airr dependency resolved the issue. I don't think there is any bug that needs to be fixed on the immuneML side in this case.
Apologies for the confusion about the YAML examples, I didn't test run those and it looks like I forgot the number_of_processes parameter (I thought there was a default value). If you add that parameter (example here: https://docs.immuneml.uio.no/latest/yaml_specs/instructions.html#datasetexport) I believe all those YAMLs should run without issues now, could you give it a try?
Hello, I've just started using this package and the installation went well. When I tried to run the quickstart analysis I kept running into the error shown below.
I think the error happens when the program tries to read the synthetic dataset in AIRR format, but there is some issue with the way the columns are specified.
I investigated the synthetic airr file rep_0.tsv and found that the sequence_id column has some weird issues. This is an example of the file contents:
This is my output. Any help is appreciated.