Closed rugilemat closed 7 months ago
Hi, thanks for your kind words :)
I seem to recall debugging a similar problem for myself semi-recently. Can you tell me if you're using the latest commits from GitHub? If not, would you mind trying that?
Yes, it should be the latest commit.
OK, based on the warning you're getting from AnnData, it would appear that you have some duplicated transcript IDs, likely in your abundance matrix, judging by where the error is being thrown. To test this, please run one of the following code blocks in Python, depending on what format your data is in:
If you're using a TALON abundance file
import pandas as pd
df = pd.read_csv('<your abundance file>', sep='\t')
print(df.loc[df.annot_transcript_id.duplicated(keep=False)].sort_values(by='annot_transcript_id'))
If you're using the non-specific formatted abundance file:
import pandas as pd
df = pd.read_csv('<your abundance file>', sep='\t')
print(df.loc[df[df.columns[0]].duplicated(keep=False)].sort_values(by=[df.columns[0]]))
If this prints any data, you have duplicated transcript IDs in your dataset which you must address. Let me know if this helps you solve the problem, or if this code doesn't run for you (I did not test it).
Thanks for this and sorry for the delay - they have been updating our HPC and it's been a pain to get any jobs run. It seems this sorted the issue out - thanks!
Hi,
Thanks for a gorgeous tool!
I've been trying to trial swan on my samples but I seem to be encountering this error:
I'm not entirely sure where the duplicate entry issue is coming from, so any advice on that would be great!