jkimlab / DESCHRAMBLER

18 stars 8 forks source link

Running on simulated genomes #9

Closed muffato closed 2 years ago

muffato commented 2 years ago

Hi,

In #6 you helped me running DESCHRAMBLER on simulated genomes. It worked at the time, but I'm now facing a new issue on a different dataset.

DESCHRAMBLER doesn't fail, but there are two weird things:

  1. The output file APCFs.1K/Ancestor.APCF is not sorted by decreasing size. Last time, this was the indication that there was something not right
  2. There are plenty of messages on stderr such as
    Argument "0-" isn't numeric in subtraction (-) at /Users/mm49/workspace/agora/DESCHRAMBLER/script/compute_size.pl line 15, <F> line 122.
    Argument "-2000-" isn't numeric in subtraction (-) at /Users/mm49/workspace/agora/DESCHRAMBLER/script/compute_size.pl line 15, <F> line 138.

What I've debugged so far:

Do you have any idea what could have caused this ?

Best regards, Matthieu

jkimlab commented 2 years ago

I haven’t seen such errors from real data, so I don’t exactly know why such errors occurred.

I am not sure but one possibility is the problem in the simulated data. The error in the simulated data may cause errors in chain/net files, which can then raise problems in the program running.

Thanks.

On Jan 11, 2022, at 10:41 AM, Matthieu Muffato @.***> wrote:

Hi,

In #6 https://github.com/jkimlab/DESCHRAMBLER/issues/6 you helped me running DESCHRAMBLER on simulated genomes. It worked at the time, but I'm now facing a new issue on a different dataset.

DESCHRAMBLER doesn't fail, but there are two weird things:

The output file APCFs.1K/Ancestor.APCF is not sorted by decreasing size. Last time, this was the indication that there was something not right There are plenty of messages on stderr such as Argument "0-" isn't numeric in subtraction (-) at /Users/mm49/workspace/agora/DESCHRAMBLER/script/compute_size.pl line 15, line 122. Argument "-2000-" isn't numeric in subtraction (-) at /Users/mm49/workspace/agora/DESCHRAMBLER/script/compute_size.pl line 15, line 138. What I've debugged so far:

The warnings come from this command: ~/workspace/agora/DESCHRAMBLER/script/compute_size.pl APCF APCFs.1K/APCF_Homo_sapiens.merged.map APCFs.1K/APCF_Homo_sapiens.merged.map has got some weird entries for APCF, e.g.:

16 APCF.2:0--1000 + Homo_sapiens.chr27:1241000-1240000 +

17 APCF.2:-1000-0 + Homo_sapiens.chr27:1242000-1243000 + I believe those negative values mess up the computation of the CAR sizes, and therefore their ordering. Do you have any idea what could have caused this ?

Best regards, Matthieu

— Reply to this email directly, view it on GitHub https://github.com/jkimlab/DESCHRAMBLER/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU6PV4GQBTQ4YF23QMQK73UVODGHANCNFSM5LVC2ZSQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.

muffato commented 2 years ago

Hi. In these simulations, I directly have a list of conserved regions, no chains/nets. The difficulty last time was to format Conserved.Segments correctly, with all the implicit rules such as "chromosomes must be named chr[0-9]+, the reference species must always be on the positive strand, etc. I've tried a few more things with no luck. I still get the same thing: some negative coordinates, a parser that doesn't expect negative values, and as a result the sum of lengths is not right. I actually don't know if the reconstruction is even affected. Maybe the problem is really just at the very end, and prevents the output file to be sorted, with no impact on the adjacencies 🤞🏼

Cheers,