Open SarahOuologuem opened 8 months ago
Hi @SarahOuologuem thanks for opening this PR and thanks @VladimirShitov for the review! I read through it and left some comments with thoughts on some of the conversations. Let me know if I can be of more help to keep this PR moving forward!
Hi @SarahOuologuem I noticed that you implemented tests, which is really great! Currently, the test data was not uploaded into our test s3 bucket. Could you provide me with a link so that I can download the data (assuming it is public)? I will put it in our bucket. Otherwise, I think we could quickly connect on slack. Thanks :)
A general recommendation: it would be great to have more descriptive commit comments. For example "Change tabulation" or "Remove spaces" instead of "Small fixes". It would allow to quickly understand what happened without diving deeper in the code
Hi Sarah!
Have you tried running viash test src/dimred/lsi/config.vsh.yaml
?
I get:
=================================== FAILURES ===================================
______________________ test_select_highly_variable_column ______________________
tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0')
def test_select_highly_variable_column(tmp_path):
output_path = tmp_path / "output_lsi.h5mu"
# run component
cmd_args = [
meta["executable"],
"--input", str(input_path),
"--output", str(output_path),
"--var_input", "highly_variable"
]
> subprocess.run(cmd_args, check=True)
tmp/viash-run-lsi-UhQpoQ.py:81:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1
def run(*popenargs,
input=None, capture_output=False, timeout=None, check=False, **kwargs):
"""Run command with arguments and return a CompletedProcess instance.
The returned instance will have attributes args, returncode, stdout and
stderr. By default, stdout and stderr are not captured, and those attributes
will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
If check is True and the exit code was non-zero, it raises a
CalledProcessError. The CalledProcessError object will have the return code
in the returncode attribute, and output & stderr attributes if those streams
were captured.
If timeout is given, and the process takes too long, a TimeoutExpired
exception will be raised.
There is an optional argument "input", allowing you to
pass bytes or a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
it will be used internally.
By default, all communication is in bytes, and therefore any "input" should
be bytes, and the stdout and stderr will be bytes. If in text mode, any
"input" should be a string, and stdout and stderr will be strings decoded
according to locale encoding, or by "encoding" if set. Text mode is
triggered by setting any of text, encoding, errors or universal_newlines.
The other arguments are the same as for the Popen constructor.
"""
if input is not None:
if kwargs.get('stdin') is not None:
raise ValueError('stdin and input arguments may not both be used.')
kwargs['stdin'] = PIPE
if capture_output:
if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
raise ValueError('stdout and stderr arguments may not be used '
'with capture_output.')
kwargs['stdout'] = PIPE
kwargs['stderr'] = PIPE
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
retcode = process.poll()
if check and retcode:
> raise CalledProcessError(retcode, process.args,
output=stdout, stderr=stderr)
E subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', 'highly_variable']' returned non-zero exit status 1.
/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:17,969 INFO Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
2024-01-09 08:26:18,700 INFO Using modality 'atac' and adata.X for LSI computation
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/viash-run-lsi-NUVvcM.py", line 93, in <module>
adata_input_layer = subset_vars(adata_input_layer, par["var_input"])
File "/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/subset_vars.py", line 17, in subset_vars
raise ValueError(f"Requested to use .var column '{subset_col}' as a selection of genes, but the column is not available.")
ValueError: Requested to use .var column 'highly_variable' as a selection of genes, but the column is not available.
__________________________ test_selecting_input_layer __________________________
tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0')
def test_selecting_input_layer(tmp_path):
output_path = tmp_path / "output_lsi.h5mu"
# run component
cmd_args = [
meta["executable"],
"--input", str(input_path),
"--output", str(output_path),
"--num_components", "20",
"--layer", "counts"
]
> subprocess.run(cmd_args, check=True)
tmp/viash-run-lsi-UhQpoQ.py:136:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1
def run(*popenargs,
input=None, capture_output=False, timeout=None, check=False, **kwargs):
"""Run command with arguments and return a CompletedProcess instance.
The returned instance will have attributes args, returncode, stdout and
stderr. By default, stdout and stderr are not captured, and those attributes
will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
If check is True and the exit code was non-zero, it raises a
CalledProcessError. The CalledProcessError object will have the return code
in the returncode attribute, and output & stderr attributes if those streams
were captured.
If timeout is given, and the process takes too long, a TimeoutExpired
exception will be raised.
There is an optional argument "input", allowing you to
pass bytes or a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
it will be used internally.
By default, all communication is in bytes, and therefore any "input" should
be bytes, and the stdout and stderr will be bytes. If in text mode, any
"input" should be a string, and stdout and stderr will be strings decoded
according to locale encoding, or by "encoding" if set. Text mode is
triggered by setting any of text, encoding, errors or universal_newlines.
The other arguments are the same as for the Popen constructor.
"""
if input is not None:
if kwargs.get('stdin') is not None:
raise ValueError('stdin and input arguments may not both be used.')
kwargs['stdin'] = PIPE
if capture_output:
if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
raise ValueError('stdout and stderr arguments may not be used '
'with capture_output.')
kwargs['stdout'] = PIPE
kwargs['stderr'] = PIPE
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
retcode = process.poll()
if check and retcode:
> raise CalledProcessError(retcode, process.args,
output=stdout, stderr=stderr)
E subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', '20', '--layer', 'counts']' returned non-zero exit status 1.
/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:34,271 INFO Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/viash-run-lsi-tOA0u2.py", line 80, in <module>
raise ValueError(f"Layer '{par['layer']}' was not found in modality '{par['modality']}'.")
ValueError: Layer 'counts' was not found in modality 'atac'.
=============================== warnings summary ===============================
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
/usr/local/lib/python3.9/site-packages/anndata/_core/anndata.py:453: PendingDeprecationWarning: The dtype argument will be deprecated in anndata 0.10.0
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_select_highly_variable_column - subp...
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_selecting_input_layer - subprocess.C...
============= 2 failed, 6 passed, 8 warnings in 117.84s (0:01:57) ==============
====================================================================
ERROR! Only 0 out of 1 test scripts succeeded!
Unexpected error occurred! If you think this is a bug, please post
create an issue at https://github.com/viash-io/viash/issues containing
a reproducible example and the stack trace below.
Does the same error show up when you run it locally?
Hi Sarah! Just checking in with this PR. When would you have some time to look at the issue I posted?
sorry for the very late reply! yes, the errors make sense, haven't checked the new test data, i only ran the tests on my old test data. i'm currently drowning in work, especially because of exam season. please feel free to correct it yourself to speed up the process! so sorry! can't really say when i will have time to resolve the issue myself
I can take it over :) When I'll swim out of other work as well...
Changelog
Added LSI component
Issue ticket number and link
398
Checklist before requesting a review
[x] I have performed a self-review of my code
[x] Conforms to the Contributor's guide
Check the correct box. Does this PR contain:
[x] Proposed changes are described in the CHANGELOG.md
[x] CI tests succeed!