Open kernelogic opened 2 years ago
I believe this is partially implemented as car ls --unixfs baga6ea4seaqab7qam2an2mkzssn7vioorrcxhaxxszz6k4t6mscwnhvfjj4hoaq.car
@willscott oh ya it works! I guess I need to run it twice, one for CID and one for the filenames.
car ls --unixfs /mnt/pool243/cars/worldbank/baga6ea4seaqbm326hi63viv5pmmc3qn4ekt7veu7qdgcdpkcxtu7isao2wllkdi.car
globalnightlight
globalnightlight/201204
globalnightlight/201204/SVDNB_npp_d20120418_t0206526_e0212330_b02450_c20120418081234074938_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0206526_e0212330_b02450_c20120418081234074938_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0206526_e0212330_b02450_c20120418081234074938_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0212343_e0218146_b02450_c20120418081816258190_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0212343_e0218146_b02450_c20120418081816258190_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0212343_e0218146_b02450_c20120418081816258190_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0218159_e0223563_b02450_c20120418082357478905_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0218159_e0223563_b02450_c20120418082357478905_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0218159_e0223563_b02450_c20120418082357478905_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0315103_e0320507_b02451_c20120418092051137435_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0315103_e0320507_b02451_c20120418092051137435_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0315103_e0320507_b02451_c20120418092051137435_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0320519_e0326323_b02451_c20120418092632275819_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0320519_e0326323_b02451_c20120418092632275819_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0320519_e0326323_b02451_c20120418092632275819_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0326335_e0332139_b02451_c20120418093214412769_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0326335_e0332139_b02451_c20120418093214412769_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0326335_e0332139_b02451_c20120418093214412769_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0332152_e0337555_b02451_c20120418093755509353_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0332152_e0337555_b02451_c20120418093755509353_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0332152_e0337555_b02451_c20120418093755509353_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0337568_e0343354_b02451_c20120418094336640323_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0337568_e0343354_b02451_c20120418094336640323_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0337568_e0343354_b02451_c20120418094336640323_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0343366_e0349170_b02451_c20120418094918746170_noaa_ops.rade9.co.co.tif
However the two have very different number of lines, CID is much much more than the files
car ls --unixfs /mnt/pool243/cars/worldbank/baga6ea4seaqbm326hi63viv5pmmc3qn4ekt7veu7qdgcdpkcxtu7isao2wllkdi.car | wc -l
155
car ls /mnt/pool243/cars/worldbank/baga6ea4seaqbm326hi63viv5pmmc3qn4ekt7veu7qdgcdpkcxtu7isao2wllkdi.car | wc -l
10393
is this wrong? a file, especially if it's a big file, will often be made up of many chunks of multiple cids
Sorry, no it's not wrong. Just thinking how to get the CID for each file. Not sure how to map them.
you can also look at the verbose flag to print out the types of the different CIDs, and I believe that will print out the items in a unixfs directory as well
@willscott, I'm trying to get the same details as the OP, gathering the filename/directory name alongside the CID. While the --verbose
flag does provide that context, it doesn't all the links. Below is an example of running the car list
command:
dag-pb: bafybeifurw2z3xmkrflcqy2xfpr34ys7dneodveyeniov7voiss5azkdhu
10 links. 2 bytes
GEDI04_B_MW019MW223_02_002_02_R01000M_MI.tif[14 MB] bafybeiakure7paspbvxqfj4b64svy3vtxqjc72kww6doykphaq63w23ctu
GEDI04_B_MW019MW223_02_002_02_R01000M_MU.tif[503 MB] bafybeiekmgjua5qvlm3zrnkqj54mnd2y764hk42wdthrk4gfwgjgxvqepi
GEDI04_B_MW019MW223_02_002_02_R01000M_NC.tif[88 MB] bafybeiduzczefsght5ep52tjkrgh6ylpyp2h5cveyh5pgbnxfubyrdvlxy
(10 total)
Unixfs Directory
Unfortunately, the other 7 links are not printed out. I noticed here that the max lines printed is set to 3.
Here's another example but in this case, all of the raw links printed above the dag-pb
key which they're linked to.
raw: bafkreigex65obvg2cuxpd4dp5vvxdfdvon3vj4ctwubn3al7dweyc76bqy
raw: bafkreigdhkqsdcyxbj4wbnfu7bcijpc4ahrqvexycxrwrsnhfcjvuiysuq
raw: bafkreiarcvjdnhlvxx2iwdadubwmfbvqjd34h2gpyrpcftqbxnhvm6bpvu
raw: bafkreiedi4c6rebigtkz7pco4vbas3ikmezrec4cojyyh5f3733mtx5zxu
raw: bafkreigeol6iuim7a6emge22otwmhnshyvm6skjmcjhachle5mqtu47uum
raw: bafkreihn5hdzqcx4c7j5f3a34izptpkkfesvxctixxkcszgm6e3gm4q7y4
raw: bafkreihlkkvxchsdjyis2uvh6lv3tgvbhj6fomh5y2x4ro7dkl6zr2xr5m
raw: bafkreifeif6sv3yfdf4yxypmq5z67vnrvrfld22hro6shzan42e73fawri
raw: bafkreihwe4ismovddja343akjs4uk5ilmhdjktnbhc3oryspub6ffz7g5a
raw: bafkreia6boirg2imm2pdypyb45kznzw47dxhav25c2kbf5m5uwkavwllee
raw: bafkreidmyxmzlr3pljpuekbrsmz2i7orwwgjztwea63mukgn5cc3tinpwa
raw: bafkreia4ge2wl2mfez7h3cyhkcp7qmqdwtnspdbyjikugvugscly4hf2je
raw: bafkreifqm7bwo6fdl47fj6dnys4refmr7sd5354ac3wq7atxktrqgpivym
raw: bafkreif6oifgjtaiyvzbb2lpok6lejz2lu2i4tytcvnz2ttfrjszwa4xt4
raw: bafkreiaslirngvag4oa7afcako625rbzjae2yayrg33w6r2qme6wgwh6hy
dag-pb: bafybeigz7sxvjzp6463til5bysvortvyiwkxzopaslrbbxzebaaytajh4a
15 links. 67 bytes
[1.0 MB] bafkreigex65obvg2cuxpd4dp5vvxdfdvon3vj4ctwubn3al7dweyc76bqy
[1.0 MB] bafkreigdhkqsdcyxbj4wbnfu7bcijpc4ahrqvexycxrwrsnhfcjvuiysuq
[1.0 MB] bafkreiarcvjdnhlvxx2iwdadubwmfbvqjd34h2gpyrpcftqbxnhvm6bpvu
(15 total)
Unixfs File
Ideally, it would be much more readable if the output looked like this + conformed to the format when listing file/directory names:
dag-pb: bafybeigz7sxvjzp6463til5bysvortvyiwkxzopaslrbbxzebaaytajh4a
15 links. 67 bytes
raw [1.0 MB] bafkreigex65obvg2cuxpd4dp5vvxdfdvon3vj4ctwubn3al7dweyc76bqy
raw [1.0 MB] bafkreigdhkqsdcyxbj4wbnfu7bcijpc4ahrqvexycxrwrsnhfcjvuiysuq
raw [1.0 MB] bafkreiarcvjdnhlvxx2iwdadubwmfbvqjd34h2gpyrpcftqbxnhvm6bpvu
raw [1.0 MB] bafkreiedi4c6rebigtkz7pco4vbas3ikmezrec4cojyyh5f3733mtx5zxu
raw [1.0 MB] bafkreigeol6iuim7a6emge22otwmhnshyvm6skjmcjhachle5mqtu47uum
raw [1.0 MB] bafkreihn5hdzqcx4c7j5f3a34izptpkkfesvxctixxkcszgm6e3gm4q7y4
raw [1.0 MB] bafkreihlkkvxchsdjyis2uvh6lv3tgvbhj6fomh5y2x4ro7dkl6zr2xr5m
raw [1.0 MB] bafkreifeif6sv3yfdf4yxypmq5z67vnrvrfld22hro6shzan42e73fawri
raw [1.0 MB] bafkreihwe4ismovddja343akjs4uk5ilmhdjktnbhc3oryspub6ffz7g5a
raw [1.0 MB] bafkreia6boirg2imm2pdypyb45kznzw47dxhav25c2kbf5m5uwkavwllee
raw [1.0 MB] bafkreidmyxmzlr3pljpuekbrsmz2i7orwwgjztwea63mukgn5cc3tinpwa
raw [1.0 MB] bafkreia4ge2wl2mfez7h3cyhkcp7qmqdwtnspdbyjikugvugscly4hf2je
raw [1.0 MB] bafkreifqm7bwo6fdl47fj6dnys4refmr7sd5354ac3wq7atxktrqgpivym
raw [1.0 MB] bafkreif6oifgjtaiyvzbb2lpok6lejz2lu2i4tytcvnz2ttfrjszwa4xt4
raw [1.0 MB] bafkreiaslirngvag4oa7afcako625rbzjae2yayrg33w6r2qme6wgwh6hy
Unixfs File
@SethDocherty - is what you want a machine parse-able listing of all of the unixfs 'named' items in a car? do you want the raw block cids within the unixfs files as well?
@SethDocherty - is what you want a machine parse-able listing of all of the unixfs 'named' items in a car? do you want the raw block cids within the unixfs files as well?
Apologies for the delay in responding. Hope you had the chance to enjoy the holidays!
In the hopes of answering your question, let me provide some auxiliary details on what I'm trying to do with go-car
.
I'm building out a workflow using Singularity v2 to pack up content into CARs as well as performing partial extractions. Singularity lacks the capability to retrieve details about the content found inside CARs. This leads me to why I'm using go-car
. I can get the CIDs for all the unixfs 'named' items e.g. directories and files with the car list
command, cherry pick the CIDs of content I want to extract and then pass into the Extract Car command. So based on my workflow, I'm really only interested in the CIDs of the unixfs 'named' items. But later down the road I could see a case where I'd be interested in the raw block CIDs within the unixfs file.
Side note, I could use go-car
to extract content but Singularity (which is the tool I'm focusing on for this workflow) creates a 'manifest' CAR file containing all the CIDs of all the unixfs 'named' items. My understanding of 'go-car' is that the extract
command will only extract content if all the CID references are in a single CAR file.
@SethDocherty see if https://github.com/ipld/go-car/pull/514 does what you need
@willscott, Thanks! Just tested it on the CAR generated through Singularity and it renders two columns with the CIDs and unixfx path for each item.
While I do prefer the additional details that come with the verbose flag such as the filesize and counts, this gets me what I need.
Hello, right now car ls command only lists CIDs, is it possible to allow it to print the relative path and filename as well?
For example car ls baga6ea4seaqab7qam2an2mkzssn7vioorrcxhaxxszz6k4t6mscwnhvfjj4hoaq.car