jkimlab / DESCHRAMBLER

18 stars 8 forks source link

Running on simulated genomes #6

Closed muffato closed 3 years ago

muffato commented 3 years ago

Hello,

I want to test DESCHRAMBLER on simulated genomes (made as lists of genes), and I would like to know how I can make it not require the genome alignments (no sequences are simulated).

I have generated a Conserved.Segments file from my simulations. Like in sim.data.new.tar.gz (the simulations in your 2017 PNAS paper), each gene takes 1kb of space in the genome:

>0
Gallus_gallus.16:415000-416000 -
Monodelphis_domestica.4:5000-6000 -
Canis_lupus_familiaris.15:651000-652000 +
Homo_sapiens.13:648000-649000 +
Mus_musculus.13:206000-207000 +

>1
Gallus_gallus.2:869000-870000 +
Monodelphis_domestica.8:4310000-4311000 -
Canis_lupus_familiaris.28:270000-271000 -
Homo_sapiens.23:2223000-2224000 +
Mus_musculus.8:833000-834000 +

(...)

I hacked Makefile.SFs to use his file instead, and I have changed the RESOLUTION parameter to 1kb, but it still tries to access the alignments to estimate the breakpoint distances, and the resulting reconstructions are 1 APCF per entry in Conserved.Segments - i.e. no reconstruction.

Could you please help ?

Regards, Matthieu

jkimlab commented 3 years ago

Hi Matthieu,

Can you provide the following additional information?

  1. List of files and their size in the “SFs” directory.

  2. Content of the parameter file

  3. Program output log

Best,

Jaebum

On May 17, 2021, at 8:07 PM, Matthieu Muffato @.***> wrote:

Hello,

I want to test DESCHRAMBLER on simulated genomes (made as lists of genes), and I would like to know how I can make it not require the genome alignments (no sequences are simulated).

I have generated a Conserved.Segments file from my simulations. Like in sim.data.new.tar.gz (the simulations in your 2017 PNAS paper), each gene takes 1kb of space in the genome:

0 Gallus_gallus.16:415000-416000 - Monodelphis_domestica.4:5000-6000 - Canis_lupus_familiaris.15:651000-652000 + Homo_sapiens.13:648000-649000 + Mus_musculus.13:206000-207000 +

1 Gallus_gallus.2:869000-870000 + Monodelphis_domestica.8:4310000-4311000 - Canis_lupus_familiaris.28:270000-271000 - Homo_sapiens.23:2223000-2224000 + Mus_musculus.8:833000-834000 +

(...) I hacked Makefile.SFs to use his file instead, and I have changed the RESOLUTION parameter to 1kb, but it still tries to access the alignments to estimate the breakpoint distances, and the resulting reconstructions are 1 APCF per entry in Conserved.Segments - i.e. no reconstruction.

Could you please help ?

Regards, Matthieu

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV6QJQHUZUVMHX7LYXLTOD2HXANCNFSM45AHORSQ.

muffato commented 3 years ago

Sure, thank you for the quick reply !

1.

704     APCFs.1K/SFs
176953  APCFs.1K/SFs/Mus_musculus.joins
49      APCFs.1K/SFs/ingroup.txt
2821    APCFs.1K/SFs/Makefile
93      APCFs.1K/SFs/bpdist.txt
160     APCFs.1K/SFs/SFs_Gallus_gallus
2821    APCFs.1K/SFs/SFs_Gallus_gallus/Makefile
0       APCFs.1K/SFs/SFs_Gallus_gallus/Building.Blocks
590     APCFs.1K/SFs/SFs_Gallus_gallus/config.file
160     APCFs.1K/SFs/SFs_Canis_lupus_familiaris
2821    APCFs.1K/SFs/SFs_Canis_lupus_familiaris/Makefile
0       APCFs.1K/SFs/SFs_Canis_lupus_familiaris/Building.Blocks
590     APCFs.1K/SFs/SFs_Canis_lupus_familiaris/config.file
249059  APCFs.1K/SFs/Genomes.Order.new
0       APCFs.1K/SFs/block_consscores.txt
759908  APCFs.1K/SFs/adjacencies.prob
248981  APCFs.1K/SFs/Genomes.Order
2518304 APCFs.1K/SFs/Conserved.Segments
171277  APCFs.1K/SFs/Canis_lupus_familiaris.joins
141477  APCFs.1K/SFs/Gallus_gallus.joins
158792  APCFs.1K/SFs/Monodelphis_domestica.joins
362332  APCFs.1K/SFs/block_list.txt
184989  APCFs.1K/SFs/Homo_sapiens.joins
128     APCFs.1K/SFs/SFs_Monodelphis_domestica
2821    APCFs.1K/SFs/SFs_Monodelphis_domestica/Makefile
590     APCFs.1K/SFs/SFs_Monodelphis_domestica/config.file
36      APCFs.1K/SFs/outgroup.txt
587     APCFs.1K/SFs/config.file
128     APCFs.1K/SFs/SFs_Mus_musculus
2821    APCFs.1K/SFs/SFs_Mus_musculus/Makefile
590     APCFs.1K/SFs/SFs_Mus_musculus/config.file

2.

# Reference species
REFSPC=Homo_sapiens

# Output directory
OUTPUTDIR=APCFs.1K
#OUTPUTDIR=Out_RACA.anctest

# Block resolution (bp)
RESOLUTION=1000
#RESOLUTION=300000

# Newick tree file
# Refer to the sample file 'tree.txt'.
TREEFILE=tree.txt

# Minimum adjacency scores
MINADJSCR=0.0001

# Config and make files for syntenic fragment construction
# Refer to the sample files 'config.SFs' and 'Makefile.SFs'.
# You need to change settings in the sample configuration file (config.SFs) according to your data
# You don't need to change anything in the Makefile.SFs.
CONFIGSFSFILE=config.SFs
MAKESFSFILE=Makefile.SFs

The tree tree.txt has no branch lengths, but I can provide some if that's required:

((((Homo_sapiens,Mus_musculus),Canis_lupus_familiaris)@,Monodelphis_domestica),Gallus_gallus);

3.


## Constructing syntenic fragments ##
cp -p /Users/mm49/workspace/simu2/Conserved.Segments Conserved.Segments
======== creating input files for inferring CARs ========
/Users/mm49/workspace/DESCHRAMBLER/code/makeBlocks/createGenomeFile config.file Conserved.Segments > Genomes.Order
/Users/mm49/workspace/DESCHRAMBLER/script/create_blocklist.pl Homo_sapiens APCFs.1K/SFs
/Users/mm49/workspace/DESCHRAMBLER/script/wrap_recon_apcf.pl /Users/mm49/workspace/DESCHRAMBLER/matt/tree.txt 1000 Homo_sapiens 0.0001 APCFs.1K/SFs APCFs.1K
Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Canis_lupus_familiaris/Homo_sapiens/net
Cannot open Canis_lupus_familiaris.processed.segs.
make: *** [Building.Blocks] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Mus_musculus/net
- processing Mus_musculus.raw.segs
Cannot open Mus_musculus.raw.segs.
make: *** [Grab.Data] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Gallus_gallus/Homo_sapiens/net
Cannot open Gallus_gallus.processed.segs.
make: *** [Building.Blocks] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Monodelphis_domestica/net
- processing Monodelphis_domestica.raw.segs
Cannot open Monodelphis_domestica.raw.segs.
make: *** [Grab.Data] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
TREE ((((Homo_sapiens,Mus_musculus),Canis_lupus_familiaris)@,Monodelphis_domestica),Gallus_gallus);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
14984 Homo_sapiens /Users/mm49/workspace/DESCHRAMBLER/matt/tree.txt APCFs.1K/SFs/bpdist.txt
Estimate JC parameter: 0.0001
readLeafGenomes: Homo_sapiens
readLeafGenomes: Mus_musculus
readLeafGenomes: Canis_lupus_familiaris
T=14984
Initializing Homo_sapiens (ingroup)
Initializing Mus_musculus (ingroup)
Initializing Canis_lupus_familiaris (ingroup)
Initializing Gallus_gallus (outgroup)
Initializing Monodelphis_domestica (outgroup)
Computing posterior probabilities ...
Died at /Users/mm49/workspace/DESCHRAMBLER/script/refine_adjprob.pl line 26, <F> line 207.
Minimum weight = 0.0001
Conservation score file = APCFs.1K/SFs/block_consscores.txt
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
- Totally 14983 APCFs

Some comments:

jkimlab commented 3 years ago

The first problem is the absence of branch lengths. It is required information. Please try again after adding the branch lengths.

On May 18, 2021, at 12:17 AM, Matthieu Muffato @.***> wrote:

Sure, thank you for the quick reply !

704 APCFs.1K/SFs 176953 APCFs.1K/SFs/Mus_musculus.joins 49 APCFs.1K/SFs/ingroup.txt 2821 APCFs.1K/SFs/Makefile 93 APCFs.1K/SFs/bpdist.txt 160 APCFs.1K/SFs/SFs_Gallus_gallus 2821 APCFs.1K/SFs/SFs_Gallus_gallus/Makefile 0 APCFs.1K/SFs/SFs_Gallus_gallus/Building.Blocks 590 APCFs.1K/SFs/SFs_Gallus_gallus/config.file 160 APCFs.1K/SFs/SFs_Canis_lupus_familiaris 2821 APCFs.1K/SFs/SFs_Canis_lupus_familiaris/Makefile 0 APCFs.1K/SFs/SFs_Canis_lupus_familiaris/Building.Blocks 590 APCFs.1K/SFs/SFs_Canis_lupus_familiaris/config.file 249059 APCFs.1K/SFs/Genomes.Order.new 0 APCFs.1K/SFs/block_consscores.txt 759908 APCFs.1K/SFs/adjacencies.prob 248981 APCFs.1K/SFs/Genomes.Order 2518304 APCFs.1K/SFs/Conserved.Segments 171277 APCFs.1K/SFs/Canis_lupus_familiaris.joins 141477 APCFs.1K/SFs/Gallus_gallus.joins 158792 APCFs.1K/SFs/Monodelphis_domestica.joins 362332 APCFs.1K/SFs/block_list.txt 184989 APCFs.1K/SFs/Homo_sapiens.joins 128 APCFs.1K/SFs/SFs_Monodelphis_domestica 2821 APCFs.1K/SFs/SFs_Monodelphis_domestica/Makefile 590 APCFs.1K/SFs/SFs_Monodelphis_domestica/config.file 36 APCFs.1K/SFs/outgroup.txt 587 APCFs.1K/SFs/config.file 128 APCFs.1K/SFs/SFs_Mus_musculus 2821 APCFs.1K/SFs/SFs_Mus_musculus/Makefile 590 APCFs.1K/SFs/SFs_Mus_musculus/config.file

Reference species

REFSPC=Homo_sapiens

Output directory

OUTPUTDIR=APCFs.1K

OUTPUTDIR=Out_RACA.anctest

Block resolution (bp)

RESOLUTION=1000

RESOLUTION=300000

Newick tree file

Refer to the sample file 'tree.txt'.

TREEFILE=tree.txt

Minimum adjacency scores

MINADJSCR=0.0001

Config and make files for syntenic fragment construction

Refer to the sample files 'config.SFs' and 'Makefile.SFs'.

You need to change settings in the sample configuration file (config.SFs) according to your data

You don't need to change anything in the Makefile.SFs.

CONFIGSFSFILE=config.SFs MAKESFSFILE=Makefile.SFs The tree tree.txt has no branch lengths, but I can provide some if that's required:

((((Homo_sapiens,Mus_musculus),Canis_lupus_familiaris)@,Monodelphis_domestica),Gallus_gallus);

Constructing syntenic fragments

cp -p /Users/mm49/workspace/simu2/Conserved.Segments Conserved.Segments ======== creating input files for inferring CARs ======== /Users/mm49/workspace/DESCHRAMBLER/code/makeBlocks/createGenomeFile config.file Conserved.Segments > Genomes.Order /Users/mm49/workspace/DESCHRAMBLER/script/create_blocklist.pl Homo_sapiens APCFs.1K/SFs /Users/mm49/workspace/DESCHRAMBLER/script/wrap_recon_apcf.pl /Users/mm49/workspace/DESCHRAMBLER/matt/tree.txt 1000 Homo_sapiens 0.0001 APCFs.1K/SFs APCFs.1K Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Canis_lupus_familiaris/Homo_sapiens/net Cannot open Canis_lupus_familiaris.processed.segs. make: *** [Building.Blocks] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Mus_musculus/net

  • processing Mus_musculus.raw.segs Cannot open Mus_musculus.raw.segs. make: [Grab.Data] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Gallus_gallus/Homo_sapiens/net Cannot open Gallus_gallus.processed.segs. make: [Building.Blocks] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. Error - Could not open net dir /Users/mm49/workspace/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Monodelphis_domestica/net
  • processing Monodelphis_domestica.raw.segs Cannot open Monodelphis_domestica.raw.segs. make: *** [Grab.Data] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. TREE ((((Homo_sapiens,Mus_musculus),Canis_lupus_familiaris)@,Monodelphis_domestica),Gallus_gallus);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. 14984 Homo_sapiens /Users/mm49/workspace/DESCHRAMBLER/matt/tree.txt APCFs.1K/SFs/bpdist.txt Estimate JC parameter: 0.0001 readLeafGenomes: Homo_sapiens readLeafGenomes: Mus_musculus readLeafGenomes: Canis_lupus_familiaris T=14984 Initializing Homo_sapiens (ingroup) Initializing Mus_musculus (ingroup) Initializing Canis_lupus_familiaris (ingroup) Initializing Gallus_gallus (outgroup) Initializing Monodelphis_domestica (outgroup) Computing posterior probabilities ... Died at /Users/mm49/workspace/DESCHRAMBLER/script/refine_adjprob.pl line 26, line 207. Minimum weight = 0.0001 Conservation score file = APCFs.1K/SFs/block_consscores.txt UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.

  • Totally 14983 APCFs Some comments:

The first cp is how I hacked my own Conserved.Segments into the workflow I have transformed some of the ``-enclosed commands to print("..."); system("...") so that I could see which command was run and their own output — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-842409862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV4GU3CAJSEDKPMXI2DTOEXSHANCNFSM45AHORSQ.

muffato commented 3 years ago

Thanks. Will do that. What is the unit of the lengths ? Does it matter if I say multiply all branches by 10 ?

jkimlab commented 3 years ago

You can use either wall clock time or expected number of substitutions per site. Using 10 for all branches? If you don’t know branch lengths, you can try it.

      1. 오전 5:52, Matthieu Muffato @.***> 작성:

 Thanks. Will do that. What is the unit of the lengths ? Does it matter if I say multiply all branches by 10 ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

muffato commented 3 years ago

Hi @jkimlab . It worked, thanks. But there is something strange in Ancestor.APCF: some APCFs have two lines, cf APCF 49 below.

# APCF 48
-5946 5808 -7446 -5360 9524 2263 5072 11131 -6579 -2868 4454 8995 4311 9645 -2002 -13821 -11864 11373 7904 1085 8645 2255 -14836 4026 $
# APCF 49
13429 2530 4130 -3217 -4640 14546 12335 11054 -11708 12400 5330 -2984 -10633 3828 -12585 11574 988 -1885 $
6892 7721 -11547 -1546 -8358 9097 -13496 9663 12152 -12899 9780 -5633 -2899 12566 11257 11761 7787 9701 $
# APCF 50
4695 -14550 -11416 4817 10295 -796 -11621 -6104 -2225 -12957 12116 -152 -2164 5804 -3939 -8186 14545 $
# APCF 51

The documentation says:

Lines starting with "#": # APCF Other lines: the order and orientation of SFs belonging to the APCF shown immediately above line. SFs are shown by using their identifiers (refer to the block_list.txt file).

Which I interpreted as "Here is the content of the 48th APCF, the 49th APCF, etc."

The program says there are 117 APCFs

$ ../DESCHRAMBLER.pl $PWD/params.txt
(...)
- Totally 117 APCFs

but when I check the file, I have this:

$ grep -c '^>' Ancestor.APCF
1
$ grep -c '^#' Ancestor.APCF
63
$ grep -c '\$$' Ancestor.APCF
117

So it looks like the indexing # APCF 49 is wrong and I should just ignore it ?

muffato commented 3 years ago

(apart from that, the results look fine: the ancestral genome found in SFs/block_list.txt matches the expected, simulated, one very well)

jkimlab commented 3 years ago

The output looks strange.

Did the program run without any error message? Can you share again the output log and how you set the tree?

I think the program may not be completely modified for your case.

On May 23, 2021, at 12:16 AM, Matthieu Muffato @.***> wrote:

Hi @jkimlab https://github.com/jkimlab . It worked, thanks. But there is something strange in Ancestor.APCF: some APCFs have two lines, cf APCF 49 below.

APCF 48

-5946 5808 -7446 -5360 9524 2263 5072 11131 -6579 -2868 4454 8995 4311 9645 -2002 -13821 -11864 11373 7904 1085 8645 2255 -14836 4026 $

APCF 49

13429 2530 4130 -3217 -4640 14546 12335 11054 -11708 12400 5330 -2984 -10633 3828 -12585 11574 988 -1885 $ 6892 7721 -11547 -1546 -8358 9097 -13496 9663 12152 -12899 9780 -5633 -2899 12566 11257 11761 7787 9701 $

APCF 50

4695 -14550 -11416 4817 10295 -796 -11621 -6104 -2225 -12957 12116 -152 -2164 5804 -3939 -8186 14545 $

APCF 51

The documentation says:

Lines starting with "#": # APCF Other lines: the order and orientation of SFs belonging to the APCF shown immediately above line. SFs are shown by using their identifiers (refer to the block_list.txt file).

Which I interpreted as "Here is the content of the 48th APCF, the 49th APCF, etc."

The program says there are 117 APCFs

$ ../DESCHRAMBLER.pl $PWD/params.txt (...)

  • Totally 117 APCFs but when I check the file, I have this:

$ grep -c '^>' Ancestor.APCF 1 $ grep -c '^#' Ancestor.APCF 63 $ grep -c '\$$' Ancestor.APCF 117 So it looks like the indexing # APCF 49 is wrong and I should just ignore it ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-846422392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PVYXTJTLL7NQ52BPS63TO7DFNANCNFSM45AHORSQ.

muffato commented 3 years ago

## Constructing syntenic fragments ##
cp -p /Users/mm49/workspace/agora/simulations/simu2/Conserved.Segments Conserved.Segments
======== creating input files for inferring CARs ========
/Users/mm49/workspace/agora/simulations/DESCHRAMBLER/code/makeBlocks/createGenomeFile config.file Conserved.Segments > Genomes.Order
/Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/create_blocklist.pl Homo_sapiens APCFs.1K/SFs
/Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/wrap_recon_apcf.pl /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt 1000 Homo_sapiens 0.0001 APCFs.1K/SFs APCFs.1K
Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Canis_lupus_familiaris/Homo_sapiens/net
Cannot open Canis_lupus_familiaris.processed.segs.
make: *** [Building.Blocks] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Mus_musculus/net
- processing Mus_musculus.raw.segs
Cannot open Mus_musculus.raw.segs.
make: *** [Grab.Data] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Gallus_gallus/Homo_sapiens/net
Cannot open Gallus_gallus.processed.segs.
make: *** [Building.Blocks] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Monodelphis_domestica/net
- processing Monodelphis_domestica.raw.segs
Cannot open Monodelphis_domestica.raw.segs.
make: *** [Grab.Data] Error 1
readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61.
TREE ((((Homo_sapiens:1.5611,Mus_musculus:2.1014):1.5229,Canis_lupus_familiaris:1.1084)@:1.6692,Monodelphis_domestica:1.6262):1.1361,Gallus_gallus:1.1042);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
14984 Homo_sapiens /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt APCFs.1K/SFs/bpdist.txt
Estimate JC parameter: 0.0001
readLeafGenomes: Homo_sapiens
readLeafGenomes: Mus_musculus
readLeafGenomes: Canis_lupus_familiaris
T=14984
Initializing Homo_sapiens (ingroup)
Initializing Mus_musculus (ingroup)
Initializing Canis_lupus_familiaris (ingroup)
Initializing Gallus_gallus (outgroup)
Initializing Monodelphis_domestica (outgroup)
Computing posterior probabilities ...
Minimum weight = 0.0001
Conservation score file = APCFs.1K/SFs/block_consscores.txt
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
Use of uninitialized value $len in addition (+) at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/sort_apcfs.pl line 43, <F> line 23.
- Totally 117 APCFs
jkimlab commented 3 years ago

The program didn’t run correctly. The first error that I found is the following. readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. The script tried to open the “Genomes.Order” file, but it failed. Please check if the file is in the right place.

On May 23, 2021, at 12:04 PM, Matthieu Muffato @.***> wrote:

Constructing syntenic fragments

cp -p /Users/mm49/workspace/agora/simulations/simu2/Conserved.Segments Conserved.Segments ======== creating input files for inferring CARs ======== /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/code/makeBlocks/createGenomeFile config.file Conserved.Segments > Genomes.Order /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/create_blocklist.pl Homo_sapiens APCFs.1K/SFs /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/wrap_recon_apcf.pl /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt 1000 Homo_sapiens 0.0001 APCFs.1K/SFs APCFs.1K Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Canis_lupus_familiaris/Homo_sapiens/net Cannot open Canis_lupus_familiaris.processed.segs. make: *** [Building.Blocks] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Mus_musculus/net

  • processing Mus_musculus.raw.segs Cannot open Mus_musculus.raw.segs. make: [Grab.Data] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Gallus_gallus/Homo_sapiens/net Cannot open Gallus_gallus.processed.segs. make: [Building.Blocks] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. Error - Could not open net dir /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/examples/chainNet/Homo_sapiens/Monodelphis_domestica/net
  • processing Monodelphis_domestica.raw.segs Cannot open Monodelphis_domestica.raw.segs. make: *** [Grab.Data] Error 1 readline() on closed filehandle F at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/estimate_bpdist4.pl line 61. TREE ((((Homo_sapiens:1.5611,Mus_musculus:2.1014):1.5229,Canis_lupus_familiaris:1.1084)@:1.6692,Monodelphis_domestica:1.6262):1.1361,Gallus_gallus:1.1042);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. 14984 Homo_sapiens /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt APCFs.1K/SFs/bpdist.txt Estimate JC parameter: 0.0001 readLeafGenomes: Homo_sapiens readLeafGenomes: Mus_musculus readLeafGenomes: Canis_lupus_familiaris T=14984 Initializing Homo_sapiens (ingroup) Initializing Mus_musculus (ingroup) Initializing Canis_lupus_familiaris (ingroup) Initializing Gallus_gallus (outgroup) Initializing Monodelphis_domestica (outgroup) Computing posterior probabilities ... Minimum weight = 0.0001 Conservation score file = APCFs.1K/SFs/block_consscores.txt UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. Use of uninitialized value $len in addition (+) at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/sort_apcfs.pl line 43, line 23.

muffato commented 3 years ago

Yes, because I don't have actual sequences. I'm trying to run it on simulated data like in your paper. All I have is a set of orthologous regions between all my genomes.

jkimlab commented 3 years ago

Yes, I know. But the “Genomes.Order” file can be generated from the “Conserved.Segments” file that you created manually. Check step 5 in the following file.

https://github.com/jkimlab/DESCHRAMBLER/blob/master/examples/Makefile.SFs https://github.com/jkimlab/DESCHRAMBLER/blob/master/examples/Makefile.SFs

Without the “Genomes.Order” file, the program can’t make correct output.

On May 23, 2021, at 9:26 PM, Matthieu Muffato @.***> wrote:

Yes, because I don't have actual sequences. I'm trying to run it on simulated data like in your paper. All I have is a set of orthologous regions between all my genomes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-846555135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV4TKCZ7Z7XJTPGEYYDTPDX7FANCNFSM45AHORSQ.

muffato commented 3 years ago

Genomes.Order is there and has this sort of data:

>Canis_lupus_familiaris 39
# 15
-6087 9829 -9582 -3175 -13854 3312 6454

All ingroups species are in there, and they seem complete

jkimlab commented 3 years ago

Then, I don’t know the reason of the error by just checking the information you provided. If you can send the compressed tar ball of your working directory (including the parameter file) to me, I can check.

On May 23, 2021, at 9:51 PM, Matthieu Muffato @.***> wrote:

Genomes.Order is there and has this sort of data:

Canis_lupus_familiaris 39

15

-6087 9829 -9582 -3175 -13854 3312 6454 All ingroups species are in there, and they seem complete

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-846558399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PVYMDUCBUY6MBPSPDILTPD23TANCNFSM45AHORSQ.

muffato commented 3 years ago

Sure. Here it is: simu2.zip Thank you !

jkimlab commented 3 years ago

OK. I found the reason.

In the “SFs” directory, you can see subdirectories with a name “SFs_”. They contain the “Conserved.Segments” and “Genomes.Order” files created by using pairwise sequence alignment between a reference species and the target species specified in the subdirectory name.

So, to run the program correctly, you need to additionally create the above two files in each of the subdirectories. The error occurred because the “Genomes.Order” files do not exist in those subdirectories.

Please try and let me know how it goes.

On May 23, 2021, at 10:33 PM, Matthieu Muffato @.***> wrote:

Sure. Here it is: simu2.zip https://github.com/jkimlab/DESCHRAMBLER/files/6528148/simu2.zip Thank you !

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-846564090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV4IGV5V5KO5Z4YFYL3TPD7ZJANCNFSM45AHORSQ.

muffato commented 3 years ago

OK, I'll give it a try. Thanks ! Why did I get something that looks like the reconstructed genome in SFs/block_list.txt ? It's very very close to the expectation

jkimlab commented 3 years ago

The block_list.txt file just contains reference-based coordinates of synteny blocks you defined in the Conserved.Segments file. So they are blocks in human genome, not in ancestor, and the program predicts the most probable order and orientation of them in the target ancestor.

On May 24, 2021, at 1:51 AM, Matthieu Muffato @.***> wrote:

OK, I'll give it a try. Thanks ! Why did I get something that looks like the reconstructed genome in SFs/block_list.txt ? It's very very close to the expectation

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-846592598, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PVYY3WPHOADBK5KVRLDTPEW7PANCNFSM45AHORSQ.

muffato commented 3 years ago

Still the same discrepancy :(

$ ../DESCHRAMBLER.pl $PWD/params.txt

## Constructing syntenic fragments ##
/Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/create_blocklist.pl Homo_sapiens /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs
/Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/wrap_recon_apcf.pl /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt 1000 Homo_sapiens 0.0001 /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K
TREE ((((Homo_sapiens:1.5611,Mus_musculus:2.1014):1.5229,Canis_lupus_familiaris:1.1084)@:1.6692,Monodelphis_domestica:1.6262):1.1361,Gallus_gallus:1.1042);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
14984 Homo_sapiens /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs/bpdist.txt
Estimate JC parameter: 9.65081390939419e-06
readLeafGenomes: Homo_sapiens
readLeafGenomes: Mus_musculus
readLeafGenomes: Canis_lupus_familiaris
T=14984
Initializing Homo_sapiens (ingroup)
Initializing Mus_musculus (ingroup)
Initializing Canis_lupus_familiaris (ingroup)
Initializing Gallus_gallus (outgroup)
Initializing Monodelphis_domestica (outgroup)
Computing posterior probabilities ...
Minimum weight = 0.0001
Conservation score file = /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs/block_consscores.txt
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
Use of uninitialized value $len in addition (+) at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/sort_apcfs.pl line 43, <F> line 137.
- Totally 115 APCFs
$ grep -c '^>' Ancestor.APCF
1
$ grep -c '^#' Ancestor.APCF
63
$ grep -c '\$$' Ancestor.APCF
115

and the IDs in Ancestor.APCF seem to differ from the other files, e.g. block_list.txt because none of the adjacency can be found in the true ancestral genome.

jkimlab commented 3 years ago

OK. Can you send me your files again?

      1. 오전 6:53, Matthieu Muffato @.***> 작성:

 Still the same discrepancy :(

$ ../DESCHRAMBLER.pl $PWD/params.txt

Constructing syntenic fragments

/Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/create_blocklist.pl Homo_sapiens /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/wrap_recon_apcf.pl /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt 1000 Homo_sapiens 0.0001 /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K TREE ((((Homo_sapiens:1.5611,Mus_musculus:2.1014):1.5229,Canis_lupus_familiaris:1.1084)@:1.6692,Monodelphis_domestica:1.6262):1.1361,Gallus_gallus:1.1042);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. 14984 Homo_sapiens /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/tree.txt /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs/bpdist.txt Estimate JC parameter: 9.65081390939419e-06 readLeafGenomes: Homo_sapiens readLeafGenomes: Mus_musculus readLeafGenomes: Canis_lupus_familiaris T=14984 Initializing Homo_sapiens (ingroup) Initializing Mus_musculus (ingroup) Initializing Canis_lupus_familiaris (ingroup) Initializing Gallus_gallus (outgroup) Initializing Monodelphis_domestica (outgroup) Computing posterior probabilities ... Minimum weight = 0.0001 Conservation score file = /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/matt/APCFs.1K/SFs/block_consscores.txt UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. Use of uninitialized value $len in addition (+) at /Users/mm49/workspace/agora/simulations/DESCHRAMBLER/script/sort_apcfs.pl line 43, line 137.

  • Totally 115 APCFs $ grep -c '^>' Ancestor.APCF 1 $ grep -c '^#' Ancestor.APCF 63 $ grep -c '\$$' Ancestor.APCF 115 and the IDs in Ancestor.APCF seem to differ from the other files, e.g. block_list.txt because none of the adjacency can be found in the true ancestral genome.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

muffato commented 3 years ago

Sure :) simu2b.zip

jkimlab commented 3 years ago

I found some problems in your Conserved.Segments file.

You need to use positive numbers, not 0, for all blocks, the line for a reference species needs to be appear first in a block definition, and the orientation of the reference block should be always +.

For example, I found the following block definition.

0 Gallus_gallus.16:415000-416000 - Monodelphis_domestica.4:5000-6000 - Canis_lupus_familiaris.15:651000-652000 + Homo_sapiens.13:648000-649000 + Mus_musculus.13:206000-207000 +

It needs to be changed as follows.

1 Homo_sapiens.13:648000-649000 + Gallus_gallus.16:415000-416000 - Monodelphis_domestica.4:5000-6000 - Canis_lupus_familiaris.15:651000-652000 + Mus_musculus.13:206000-207000 +

And in the case of the following block,

2 Gallus_gallus.9:40000-41000 - Monodelphis_domestica.2:641000-642000 - Canis_lupus_familiaris.22:499000-500000 - Homo_sapiens.20:286000-287000 -

It needs to be changed as follows.

2 Homo_sapiens.20:286000-287000 + Gallus_gallus.9:40000-41000 + Monodelphis_domestica.2:641000-642000 + Canis_lupus_familiaris.22:499000-500000 +

I not sure but I think the block order of other species does not matter.

And if there is not any aligned blocks from other species to the reference, it should be removed in the Conserved.Segments file. The example is below.

14983 Homo_sapiens.12:167000-168000 +

Please fix the above problems, and try again.

On May 24, 2021, at 9:49 AM, Matthieu Muffato @.***> wrote:

Sure :) simu2b.zip https://github.com/jkimlab/DESCHRAMBLER/files/6529209/simu2b.zip — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-846660922, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV6XAWMNG2QWLRFKTJDTPGPAZANCNFSM45AHORSQ.

muffato commented 3 years ago

Ah, I see. Yes, I can definitely change that 👍 . Thank you !

muffato commented 3 years ago

Hi, I applied those fixes and got a clean run, but there is still the same weirdness in Ancestor.APCF. The only changes I have are:

$ git diff
diff --git a/DESCHRAMBLER.pl b/DESCHRAMBLER.pl
index 134e6f5..7b66d1c 100755
--- a/DESCHRAMBLER.pl
+++ b/DESCHRAMBLER.pl
@@ -2,10 +2,11 @@

 use strict;
 use warnings;
-use FindBin qw($Bin);
 use Cwd;
 use Cwd 'abs_path';

+my $Bin = '/Users/mm49/simulations/DESCHRAMBLER';
+
 # check the number of argument
 if ($#ARGV+1 != 1) {
        print STDERR "Usage: ./DESCHRAMBLER.pl <parameter file>\n";
@@ -45,7 +46,7 @@ my $cwd = getcwd();
 `sed -e 's:<willbechanged>:$Bin/code/makeBlocks:;s:<treewillbechanged>:$params{"TREEFILE"}:' $params{"MAKESFSFILE"} > $sf_dir/Makefile`;
 #}
 chdir($sf_dir);
-`make all`;
+`make Genomes.Order`;

 chdir($cwd);
 `$Bin/script/create_blocklist.pl $params{"REFSPC"} $sf_dir`; 
diff --git a/script/estimate_bpdist4.pl b/script/estimate_bpdist4.pl
index e440285..5af8cfe 100755
--- a/script/estimate_bpdist4.pl
+++ b/script/estimate_bpdist4.pl
@@ -52,7 +52,7 @@ foreach my $tar_spc (@inspcs) {
        `cp $data_dir/Makefile $out_dir/`;

        chdir($out_dir);
-       `make pair`;
+       `make Genomes.Order`;

        open(F,"Genomes.Order");
        my $num_tarscf = 0;

$ diff examples/Makefile.SFs matt/Makefile.SFs 
37c37
< Conserved.Segments: Orthology.Blocks
---
> XConserved.Segments: Orthology.Blocks
45a46,48
> Conserved.Segments:
>   /Users/mm49/simulations/simu2/filter_CS.py $F /Users/mm49/simulations/simu2/Conserved.Segments > $@
> 

$ diff examples/params.txt matt/params.txt 
2c2
< REFSPC=spc1
---
> REFSPC=Homo_sapiens
5c5
< OUTPUTDIR=APCFs.300K
---
> OUTPUTDIR=APCFs.1K
9c9
< RESOLUTION=300000
---
> RESOLUTION=1000

$ diff examples/config.SFs.tmp matt/config.SFs 
14,16c14,18
< spc1 0 1
< spc2 1 1 
< spc3 2 1 
---
> Canis_lupus_familiaris 1 1
> Gallus_gallus 2 1
> Homo_sapiens 0 1
> Monodelphis_domestica 2 1
> Mus_musculus 1 1

Some comments:

The output is

$ ../DESCHRAMBLER.pl $PWD/params.txt

## Constructing syntenic fragments ##
TREE ((((Homo_sapiens:1.5611,Mus_musculus:2.1014):1.5229,Canis_lupus_familiaris:1.1084)@:1.6692,Monodelphis_domestica:1.6262):1.1361,Gallus_gallus:1.1042);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49//simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
14984 Homo_sapiens /Users/mm49/simulations/DESCHRAMBLER/matt/tree.txt APCFs.1K/SFs/bpdist.txt
Estimate JC parameter: 1.31554210696016e-05
readLeafGenomes: Homo_sapiens
readLeafGenomes: Mus_musculus
readLeafGenomes: Canis_lupus_familiaris
T=14984
Initializing Homo_sapiens (ingroup)
Initializing Mus_musculus (ingroup)
Initializing Canis_lupus_familiaris (ingroup)
Initializing Gallus_gallus (outgroup)
Initializing Monodelphis_domestica (outgroup)
Computing posterior probabilities ...
Minimum weight = 0.0001
Conservation score file = APCFs.1K/SFs/block_consscores.txt
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.
- Totally 115 APCFs

and the consistency tests:

$ grep -c '^#' APCFs.1K/Ancestor.APCF
63
9$ grep -c '\$$' APCFs.1K/Ancestor.APCF
115
jkimlab commented 3 years ago

Can you share again the working directory containing all fixed data files in all subdirectories?

      1. 오전 8:10, Matthieu Muffato @.***> 작성:

 Hi, I applied those fixes and got a clean run, but there is still the same weirdness in Ancestor.APCF. The only changes I have are:

$ git diff diff --git a/DESCHRAMBLER.pl b/DESCHRAMBLER.pl index 134e6f5..7b66d1c 100755 --- a/DESCHRAMBLER.pl +++ b/DESCHRAMBLER.pl @@ -2,10 +2,11 @@

use strict; use warnings; -use FindBin qw($Bin); use Cwd; use Cwd 'abs_path';

+my $Bin = '/Users/mm49/simulations/DESCHRAMBLER'; +

check the number of argument

if ($#ARGV+1 != 1) { print STDERR "Usage: ./DESCHRAMBLER.pl \n"; @@ -45,7 +46,7 @@ my $cwd = getcwd(); sed -e 's:<willbechanged>:$Bin/code/makeBlocks:;s:<treewillbechanged>:$params{"TREEFILE"}:' $params{"MAKESFSFILE"} > $sf_dir/Makefile;

}

chdir($sf_dir); -make all; +make Genomes.Order;

chdir($cwd); $Bin/script/create_blocklist.pl $params{"REFSPC"} $sf_dir; diff --git a/script/estimate_bpdist4.pl b/script/estimate_bpdist4.pl index e440285..5af8cfe 100755 --- a/script/estimate_bpdist4.pl +++ b/script/estimate_bpdist4.pl @@ -52,7 +52,7 @@ foreach my $tar_spc @.***) { cp $data_dir/Makefile $out_dir/;

    chdir($out_dir);
  • make pair;
  • make Genomes.Order;

    open(F,"Genomes.Order");
    my $num_tarscf = 0;

$ diff examples/Makefile.SFs matt/Makefile.SFs 37c37 < Conserved.Segments: Orthology.Blocks

XConserved.Segments: Orthology.Blocks 45a46,48 Conserved.Segments: /Users/mm49/simulations/simu2/filter_CS.py $F /Users/mm49/simulations/simu2/Conserved.Segments > $@

$ diff examples/params.txt matt/params.txt 2c2 < REFSPC=spc1

REFSPC=Homo_sapiens 5c5 < OUTPUTDIR=APCFs.300K

OUTPUTDIR=APCFs.1K 9c9 < RESOLUTION=300000

RESOLUTION=1000

$ diff examples/config.SFs.tmp matt/config.SFs 14,16c14,18 < spc1 0 1 < spc2 1 1 < spc3 2 1

Canis_lupus_familiaris 1 1 Gallus_gallus 2 1 Homo_sapiens 0 1 Monodelphis_domestica 2 1 Mus_musculus 1 1 Some comments:

I replaced make all and make pairwithmake Genomes.Orderand changed the ruleConserved.Segments` so that it doesn't try to access the alignment nets /Users/mm49/simulations/simu2/filter_CS.py is my script to filter the master Conserved.Segments file down to the species listed in config.file. It removes blocks when the reference species is the only one left, cf #14983 in your previous reply. By the way, when I remove a block, do I need to decrement the index of the following blocks ? All my chromosome names now start with chr (I noticed this was a requirement of using 1 as the second tag The output is

$ ../DESCHRAMBLER.pl $PWD/params.txt

Constructing syntenic fragments

TREE ((((Homo_sapiens:1.5611,Mus_musculus:2.1014):1.5229,Canis_lupus_familiaris:1.1084)@:1.6692,Monodelphis_domestica:1.6262):1.1361,Gallus_gallus:1.1042);

UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49//simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. 14984 Homo_sapiens /Users/mm49/simulations/DESCHRAMBLER/matt/tree.txt APCFs.1K/SFs/bpdist.txt Estimate JC parameter: 1.31554210696016e-05 readLeafGenomes: Homo_sapiens readLeafGenomes: Mus_musculus readLeafGenomes: Canis_lupus_familiaris T=14984 Initializing Homo_sapiens (ingroup) Initializing Mus_musculus (ingroup) Initializing Canis_lupus_familiaris (ingroup) Initializing Gallus_gallus (outgroup) Initializing Monodelphis_domestica (outgroup) Computing posterior probabilities ... Minimum weight = 0.0001 Conservation score file = APCFs.1K/SFs/block_consscores.txt UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94. UNIVERSAL->import is deprecated and will be removed in a future perl at /Users/mm49/simulations/DESCHRAMBLER/script/../lib/perl/Bio/Tree/TreeFunctionsI.pm line 94.

  • Totally 115 APCFs and the consistency tests:

$ grep -c '^#' APCFs.1K/Ancestor.APCF 63 9$ grep -c '\$$' APCFs.1K/Ancestor.APCF 115 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

muffato commented 3 years ago

Sure :) I was expecting that but didn't want to jump the gun simu2c.zip

jkimlab commented 3 years ago

All input data files are now OK, but I found that there was a problem in our script when sorting final APCFs with the same length.

I modified the script “sort_apcfs.pl” in the “script” directory. Can you re-download the script and try again?

On May 26, 2021, at 11:49 PM, Matthieu Muffato @.***> wrote:

Sure :) I was expecting that but didn't want to jump the gun simu2c.zip https://github.com/jkimlab/DESCHRAMBLER/files/6547565/simu2c.zip — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/6#issuecomment-848833599, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV7IEVUEOHMIXD76VIDTPUC55ANCNFSM45AHORSQ.

muffato commented 3 years ago

Ah, super, thanks ! All right. Everything's now fine. I could parse Ancestor.APCF and match all the IDs 👍

Thank you so much for your help !