scanoss / minr

SCANOSS Mining tool
22 stars 11 forks source link

'minr -i' command encountered a keyboard interruption, raised E056 Data sector corrupted error #30

Closed DRong1121 closed 1 year ago

DRong1121 commented 1 year ago

I am using the 'minr' tool to import a local project to ldb, I executed 3 commands as follows: 1) minr -d ... -u /mnt/my_project.zip -o /opt 2) minr -z /opt/mined 3) minr -i /opt/mined The 3rd command suddenly encountered a keyboard interruption, and the import step stopped. I re-executed the above 3 commands, but the 3rd command raised an ldb E056 error as follows:

image

The project, /mnt/my_project.zip, cannot be imported anymore. Is there any solution to my problem?

scanoss-qg commented 1 year ago

Hi @DRong1121 ! Let us know identify the issue and will go back soon. Thank you!

mscasso-scanoss commented 1 year ago

Dear @DRong1121, We apologize for the inconvenience you're experiencing. We couldn't reproduce the issue, but have you tried deleting the failed KB before importing it again using the command "rm -r /var/lib/ldb/oss"? It could be due to a broken sector during the aborted importation. If the repository you're mining isn't private, can you share the CSV with us? We want to reproduce the issue and find a resolution for you.

DRong1121 commented 1 year ago

Hi, @mscasso-scanoss, thank you for your reply! Actually, my current work is using the minr tool to mine & import millions of open-source projects, which are already downloaded on my local machine. My current issue is if I manually abort the mining and importing process, like keyboard interruption Ctrl+C during the minr -i command, the process stops. And the aborted importation caused a broken sector in KB. Although I can use the command 'rm -r /var/lib/ldb/oss' to remove the failed KB and make a new one to restart my process, it is quite a disappointing operation because my previous importation to KB is lost. Is there any other solution to solve the broken sector error without deleting the whole KB? Thanks again!

mscasso commented 1 year ago

Hi @DRong1121, sorry about the issue. We've created a ticket to add a new functionality for safely aborting the import process. You can avoid deleting the entire KB by removing the broken sector and re-importing that part. Could you provide more details on what you're trying to achieve? We can set up a meeting to discuss and provide tailored solutions.

DRong1121 commented 1 year ago

Hi @mscasso, thank you for your timely reply! Here is a piece of description of what I am going to achieve: Currently, I have downloaded 4 million open-source projects on my local machine. I accomplished a program using a multi-processing module to load and mine (minr -d -u, minr -z) each compressed project to an individual /mined directory on my local machine. Since concurrent importation to LDB is not supported, I accomplished another program using a single process to import (minr -i) the individual /mined directory of each project to LDB (/var/lib/ldb/oss).

Since I have millions of /mined directories to be imported, the importation program may be interrupted during execution, the core problem is: the aborted importation would cause a broken sector in the current KB which makes it impossible to re-import the previous /mined directory. What should I do to remove the broken sector without deleting the entire KB and restart my importaion program? Thanks!

mscasso commented 1 year ago

Dear @DRong1121, it appears that you have a broken sector in your file table based on the output you have provided. To locate the affected sector, you can run the command "ls /var/lib/ldb/oss/file", which will display multiple ".ldb" files representing the sectors. Based on your output, the problematic sector could be either "e5.ldb" or "e6.ldb". To attempt to resolve the issue, you can try renaming one of these files by adding ".back" to the filename. After that, you can reimport the data from a "mined" folder that only contains the CSV corresponding to the affected sector.

DRong1121 commented 1 year ago

Dear @mscasso, Based on your reply, to solve the broken sector problem, I have to 1)find the problematic sector, 2) rename this sector, 3) re-import only CSV corresponding to the affected sector. What I want to check is: since the /var/lib/ldb/oss folder has already contained many project data, if I rename one of the problematic .ldb file in the /oss/file, is there any influence on my current KB? For example, after the rename operation, data in the .ldb file is lost.

mscasso-scanoss commented 1 year ago

@DRong1121, it is always recommended to have a backup of the complete knowledge base before applying a new update. However, if you follow the steps you have described, you should not break anything else. We are currently working on simplifying, speeding up, and making the import procedure safer, and we will have a new release of these tools soon. In the meantime, a good backup is always welcome.