Documentation to address peer-reviewers and the following changes:
[2024.4.30] - Added concatenate_files.py which can concatenate files (and mixed compressed/decompressed files) using either arguments, list file, or glob. Reason for this is that unix has a limit of arguments that can be used (e.g., cat *.fasta > output.fasta where *.fasta results in 50k files will crash)
[2024.4.29] - Added /volumes/workspace/ directory to Docker containers for situations when your input and output directories are the same.
[2024.4.29] - featureCounts can only handle 64 threads at a time so added min(64, opts.n_jobs) for all the modules/scripts that use featureCounts commands.
[2024.4.18] - Developed a faster implementation of KofamScan called PyKofamSearch which leverage PyHmmer. This will be used in future versions of VEBA.
[2024.3.26] - Added --metaeuk_split_memory_limit to metaeuk_wrapper.py.
[2024.3.26] - Added -d/--genome_identifier_directory_index to scaffolds_to_bins.py for directories that are structured path/to/genomes/bin_a/reference.fasta where you would use -d -2.
[2024.3.26] - Added --minimum_af to edgelist_to_clusters.py with an option to accept 4 column inputs [id_1]<tab>[id_2]<tab>[weight]<tab>[alignment_fraction]. global_clustering.py, local_clustering.py, and cluster.py now use this by default --af_threshold 30.0. If you want to retain previous behavior, just use --af_threshold 0.0.
[2024.3.18] - edgelist_to_clusters.py only includes edges where both nodes are in identifiers set. If --identifiers are provided, then only those identifiers are used. If not, then it includes all nodes.
[2024.3.18] - Added --export_representatives argument for edgelist_to_clusters.py to output table with [id_node]<tab>[id_cluster]<tab>[intra-cluster_connectivity]<tab>[representative]. Also includes this information in nx.Graph objects.
[2024.3.18] - Changed singleton weight to np.nan instead of np.inf for edgelist_to_clusters.py to allow for representative calculations.
Documentation to address peer-reviewers and the following changes:
concatenate_files.py
which can concatenate files (and mixed compressed/decompressed files) using either arguments, list file, or glob. Reason for this is that unix has a limit of arguments that can be used (e.g.,cat *.fasta > output.fasta
where *.fasta results in 50k files will crash)/volumes/workspace/
directory to Docker containers for situations when your input and output directories are the same.featureCounts
can only handle 64 threads at a time so addedmin(64, opts.n_jobs)
for all the modules/scripts that usefeatureCounts
commands.uniprot_to_enzymes.py
which reformats tables and fasta from https://www.uniprot.org/uniprotkb?query=ec%3A*KofamScan
calledPyKofamSearch
which leveragePyHmmer
. This will be used in future versions of VEBA.--metaeuk_split_memory_limit
tometaeuk_wrapper.py
.-d/--genome_identifier_directory_index
toscaffolds_to_bins.py
for directories that are structuredpath/to/genomes/bin_a/reference.fasta
where you would use-d -2
.--minimum_af
toedgelist_to_clusters.py
with an option to accept 4 column inputs[id_1]<tab>[id_2]<tab>[weight]<tab>[alignment_fraction]
.global_clustering.py
,local_clustering.py
, andcluster.py
now use this by default--af_threshold 30.0
. If you want to retain previous behavior, just use--af_threshold 0.0
.edgelist_to_clusters.py
only includes edges where both nodes are in identifiers set. If--identifiers
are provided, then only those identifiers are used. If not, then it includes all nodes.--export_representatives
argument foredgelist_to_clusters.py
to output table with[id_node]<tab>[id_cluster]<tab>[intra-cluster_connectivity]<tab>[representative]
. Also includes this information innx.Graph
objects.np.nan
instead ofnp.inf
foredgelist_to_clusters.py
to allow for representative calculations.