Closed liangchengbo closed 4 months ago
Does this error occur when running the test FASTA file and test database?
Please post the full commands you used
This error also occur when running the test FASTA file and test database. Here are my commands.
/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/scripts/run_gx.py \ --fasta /mnt/z/lcb/genoms/bharal/0300.purge/purged.fa \ --tax-id 1204301 \ --gx-db /mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/all \ --out-dir /mnt/z/lcb/genoms/bharal/0301.fcs-gx \ --bin-dir /mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release
The message Fatal error: index.cpp:484 in from_stream(...): Unrecognized file content.
indicates the gxdb path either does not contain the gx database or the content is corrupted.
What is the output of the following?
cd /mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/
ls -l *
Post the files and file sizes of the ls
command. It should look similar to:
-rw-rw-r-- 1 user group 187 Jan 24 2023 all.README.txt
-rw-rw-r-- 1 user group 8887448 Jan 24 2023 all.assemblies.tsv
-rw-rw-r-- 1 user group 8241107 Jan 24 2023 all.blast_div.tsv.gz
-rw-rw-r-- 1 user group 321216733352 Jan 24 2023 all.gxi
-rw-rw-r-- 1 user group 177317125807 Jan 24 2023 all.gxs
-rw-rw-r-- 1 user group 1652 Jan 31 2023 all.manifest
-rw-rw-r-- 1 user group 59 Jan 24 2023 all.meta.jsonl
-rw-rw-r-- 1 user group 22549956 Jan 24 2023 all.seq_info.tsv.gz
-rw-rw-r-- 1 user group 6385518 Jan 24 2023 all.taxa.tsv
Also please verify the integrity of the database as follows (provide actual paths to --dir
and --mft
args)
dist/sync_files.py check --dir=/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/ --mft=/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/all.manifest
The expected output is:
===============================================================================
/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/ is up-to-date with /mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/
In the command you posted in your most recent comment, it appears you are using the 'all' database, not the test
--gx-db /mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/all
It may be that you did try using the test sets and are just copying the original command you used. Please verify the contents and integrity of the test database using the ls
and sync_files.py
commands, respectively.
I've verify the database. I ran my code with the Integrated database. But still the same error was reported. The command run in the tutorial is ". /dist/run_gx". When I run this command, the program fails to run and displays the content of "--help". So I run the command ". /scripts/run_gx.py" and got the aforementioned error. Is this the cause of my reported errors? How should I modify it?
I've verify the database. I ran my code with the Integrated database. But still the same error was reported. The command run in the tutorial is ". /dist/run_gx". When I run this command, the program fails to run and displays the content of "--help". So I run the command ". /scripts/run_gx.py" and got the aforementioned error. Is this the cause of my reported errors? How should I modify it?
Should I extract the two .gz files from the database before running?
I'm reopening this issue because you're seeing the same error message. I've copied your comments below, and see my responses:
I've verify the database. I ran my code with the Integrated database. But still the same error was reported.
Please paste the output of ls -l *
in your GX database folder. I would like to see the contents and file sizes. Also, please state whether you used sync_files.py check
to verify the database contents and whether you saw a similar message to my comment above.
The command run in the tutorial is ". /dist/run_gx". When I run this command, the program fails to run and displays the content of "--help". So I run the command ". /scripts/run_gx.py" and got the aforementioned error. Is this the cause of my reported errors?
You should be able to run either dist/run_gx
or scripts/run_gx.py
so long as --bin-dir
is set to the dist
directory containing the proper executables. Can you ls -l *
in the folder you are setting as --bin-dir
? I don't think this is the source of the error based on the message you are seeing, but would like to check.
Should I extract the two .gz files from the database before running?
No, this is not needed.
Here is the information about the database. I changed the database directory to /mnt/z/lcb/fcs-gx/gxdb/all/ . If the error is due to the integrity of the database, can you provide an alternative way to download the database? I have tried several times to download the database using the script in fcs-gx. Also downloaded it by ftp from "ftp.ncbi.nlm.nih.gov" several times. But I have never been able to resolve the error reported.
(fcs-gx) lcb@zooeco-R282-Z93:/mnt/z/lcb/fcs-gx/gxdb/all$ ls -l * -rwxrwxrwx 1 lcb lcb 8887448 7月 21 21:01 all.assemblies.tsv -rwxrwxrwx 1 lcb lcb 8241107 7月 21 21:01 all.blast_div.tsv.gz -rwxrwxrwx 1 lcb lcb 321216733352 7月 22 01:13 all.gxi -rwxrwxrwx 1 lcb lcb 177389875999 7月 21 22:15 all.gxs -rwxrwxrwx 1 lcb lcb 1652 7月 21 21:01 all.manifest -rwxrwxrwx 1 lcb lcb 59 7月 21 21:01 all.meta.jsonl -rwxrwxrwx 1 lcb lcb 192 7月 21 21:01 all.README.txt -rwxrwxrwx 1 lcb lcb 22549956 7月 21 21:01 all.seq_info.tsv.gz -rwxrwxrwx 1 lcb lcb 6385518 7月 21 21:01 all.taxa.tsv
(fcs-gx) lcb@zooeco-R282-Z93:/mnt/z/lcb/fcs-gx$ /mnt/z/lcb/fcs-gx/dist/sync_files check --dir=/mnt/z/lcb/fcs-gx/gxdb/all/ --mft=/mnt/z/lcb/fcs-gx/gxdb/all/all.manifest =============================================================================== Source: /mnt/z/lcb/fcs-gx/gxdb/all Destination: /mnt/z/lcb/fcs-gx/gxdb/all Space check: Available:11.06TiB; Existing:464.41GiB; Incoming:464.34GiB; Delta:-69.37MiB
Computing md5 hash of /mnt/z/lcb/fcs-gx/gxdb/all/all.meta.jsonl ... c2096cdb8106d44a310052b06a23836c Skipping existing 59B all.meta.jsonl
/mnt/z/lcb/fcs-gx/gxdb/all/all.README.txt - file-size changed. Requires transfer: 187B all.README.txt
Computing md5 hash of /mnt/z/lcb/fcs-gx/gxdb/all/all.taxa.tsv ... c94d1fc80f81dbbf30b114d4cdaf29ad Skipping existing 6.09MiB all.taxa.tsv
Skipping existing 7.86MiB all.blast_div.tsv.gz
Skipping existing 8.48MiB all.assemblies.tsv
Computing md5 hash of /mnt/z/lcb/fcs-gx/gxdb/all/all.seq_info.tsv.gz ... 6a760eed5a94aaf46d4dd8c75f370875 Skipping existing 21.51MiB all.seq_info.tsv.gz
/mnt/z/lcb/fcs-gx/gxdb/all/all.gxs - file-size changed. Requires transfer: 165.14GiB all.gxs
Computing md5 hash of /mnt/z/lcb/fcs-gx/gxdb/all/all.gxi ... 1b77edf28321975b3b436466fa161f7d /mnt/z/lcb/fcs-gx/gxdb/all/all.gxi - checksum changed. Requires transfer: 299.16GiB all.gxi
The presence of 'checksum changed' means that your downloaded files are likely corrupted, which would explain the GX error.
Please perform the following:
Download the test gx database:
sync_files.py get --mft=https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/test-only/test-only.manifest --dir=/path/to/test_gxdb
===============================================================================
Source: https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/test-only
Destination: /path/to/test_gxdb
Warning: aria2c is not accessible - will use curl instead (may be much slower).
Space check: Available:3.07TiB; Existing:0B; Incoming:4.29GiB; Delta:4.29GiB
...
Removing /path/to/test_gxdb.lockfile.
Done.
Verify the test gx database:
sync_files.py check --dir=/path/to/test_gxdb --mft=/path/to/test_gxdb/test-only.manifest
===============================================================================
/path/to/test_gxdb is up-to-date with /path/to/test_gxdb.
Download an example FASTA and run:
curl -LO https://zenodo.org/records/10932013/files/FCS_combo_test.fa
run_gx.py --fasta=FCS_combo_test.fa --tax-id=4932 --gx-db=/path/to/test_gxdb --bin-dir=/path/to/fcs-gx/dist/
...
...
fcs_gx_report.txt action summary:
---------------------------------
seqs bases
----- ----------
TOTAL 2 2000
----- ----- ----------
EXCLUDE 2 2000
I just got this to work.
When I download the database with the script, the following error is reported. Is this due to my environment configuration? I uploaded the database to my server after downloading it with ftp. Is my environment configuration something that will affect the integrity of my database?
File "/mnt/z/lcb/fcs-gx/./dist/sync_files", line 725, in main transfer_file(mi, src_mft_dir, work_dir) File "/mnt/z/lcb/fcs-gx/./dist/sync_files", line 588, in transfer_file subprocess.run(["curl", "-L", "-C", "-", "--retry", "5", "-o", tmp_file_path, url], check=True) File "/home/lcb/miniconda3/envs/fcs-gx/lib/python3.12/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['curl', '-L', '-C', '-', '--retry', '5', '-o', PosixPath('/mnt/z/lcb/fcs-gx/gxdb/all_py.in_progress/all.gxs.part'), 'https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/latest/all.gxs']' returned non-zero exit status 56.
Removing /mnt/z/lcb/fcs-gx/gxdb/all_py.lockfile.
Return code 56 in curl indicates a "Failure in receiving network data". This error typically occurs when there's a problem with the connection or data transfer. Specifically:
It means that the transfer was interrupted or failed before it could be completed. This can happen due to various reasons, such as:
Therefore, an alternative to sync_files.py get
is to retrieve the db files from FTP using an alternate method, but you still want to check for the integrity of the db files using sync_files.py check
In your most recent comment, the error message suggests you are trying to get the all
database, when I specifically recommended you retrieve, verify and screen with the test-only
database first. Please do that. The test-only
database is small, so if that works ok then it is more likely a connection timeout issue versus the other reasons mentioned above.
Additionally, can you run GX in a Docker or Singularity container, following the instructions on our wiki? In other words, is there a specific reason you are trying to run GX outside of a container? The container has a resumable database download mechanism.
My error has been fixed. As you said, the error was caused by a damaged database. Thank you very much for your help! I wish you much success in your future endeavors.
Glad you got it to work!
In the process of running run_gx.py, the following error is reported. Can you help me to solve it? thanks!!!
Fatal error: index.cpp:484 in from_stream(...): Unrecognized file content. Warning: missing header '##[["GX hits",2,1]]' Fatal error: taxify.cpp:350 in make_run_info_json(...): Assertion failed: agg_cvg <= 1 Error: Process failed with retcode 1: ['nice', '-n19', '/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/gx', 'align', '--gx-db=/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/database/gxdb/all.gxi', '--repeats-basis-fa=/dev/fd/6'])
Traceback (most recent call last): File "/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/scripts/run_gx.py", line 1114, in
main()
File "/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/scripts/run_gx.py", line 1089, in main
run_gx_pipeline(args)
File "/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/scripts/run_gx.py", line 732, in run_gx_pipeline
with ProcessPipeline() as p_main:
File "/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/scripts/run_gx.py", line 312, in exit
self.wait()
File "/mnt/z/lcb/fcs-gx/fcs-gx-release/fcs-gx-release/scripts/run_gx.py", line 302, in wait
assert num_errors == 0, "Had errors."
^^^^^^^^^^^^^^^
AssertionError: Had errors.