qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.1k stars 66 forks source link

dsfs.CreateDataset doesn't construct IPFS DAG links for readme & viz components #1250

Open b5 opened 4 years ago

b5 commented 4 years ago

HEAD currently isn't constructing DAG links to readme components.

steps to reproduce:

$ qri add nyc-transit-data/turnstile_daily_counts_2020
$ qri log
# make sure latest hash is /ipfs/QmaZJjtbSQVgeZSAamP4UpvVvDG7x7wHntAKSFbtxfssdU
$ qri get body nyc-transit-data/turnstile_daily_counts_2020

If you're running qri connect, that last command will just hang. If you're offline, you'll get merkledag: not found.

Running qri get body nyc-transit-data/turnstile_daily_counts_2020 --log-all while offline trips the debug call on line 40 of base.OpenDataset:

https://github.com/qri-io/qri/blob/c21419b3a0df46e69c923ffcc8bcdbeb2690f31e/base/dataset.go#L38-L43

That means qri can't find a hash it needs. Let's use IPFS to investigate.

$ ipfs ls QmaZJjtbSQVgeZSAamP4UpvVvDG7x7wHntAKSFbtxfssdU
QmdG8Mmdf93SPTAgp3yECpPzxCh3c8mMMpQ79e7d1Kgrr3 3364673 body.csv
QmQnPByQcqbzZeAotRHp4ZRKEsjcsokWm57xSzVeSh3XzA 560     commit.json
QmZLJBWBE8uvrEiHWDLMXuo1yD3uqPZbDPJmKpdSfjXyeg 526     dataset.json
QmXiQDjwVtnzNx2sNoAPAGEmfuYEJDiZnKBRVkKbzkDdZV 243     meta.json
QmXU7gg17hyErNyo5Cr8ASJcbadFFEyMfMaaKLB7rYcj8R 82      readme.md
QmRtMJFJRtD5SVGSYDaEewxD46KKuMkUzXh1mc6EuwMfy2 2030    structure.json
QmbbuH9CNfCxGbq15i2Zu8919euie8CDgD4e6NNWhYncXw 168     viz.json

ok listing links shows a readme.md file, which should be the file we need. what gives. Well, base.OpenDataset looks at dataset.readme.scriptPath in a dataset to find the IPFS path. Let's look at it:

$ ipfs cat QmaZJjtbSQVgeZSAamP4UpvVvDG7x7wHntAKSFbtxfssdU/dataset.json | jq
{
  "bodyPath": "/ipfs/QmdG8Mmdf93SPTAgp3yECpPzxCh3c8mMMpQ79e7d1Kgrr3",
  "commit": "/ipfs/QmQnPByQcqbzZeAotRHp4ZRKEsjcsokWm57xSzVeSh3XzA",
  "meta": "/ipfs/QmXiQDjwVtnzNx2sNoAPAGEmfuYEJDiZnKBRVkKbzkDdZV",
  "peername": "nyc-transit-data",
  "previousPath": "/ipfs/QmZiBjhne2H76oqJzHyUtTKF6TfeY9k6w7EYuYtqDbcCvu",
  "readme": {
    "qri": "rm:0",
    "scriptPath": "/ipfs/QmYZMAGE5VVDDGpEj7wWQgQAyBM76Q31iSXm15D8YqR9Jd"
  },
  "qri": "ds:0",
  "structure": "/ipfs/QmRtMJFJRtD5SVGSYDaEewxD46KKuMkUzXh1mc6EuwMfy2",
  "viz": "/ipfs/QmbbuH9CNfCxGbq15i2Zu8919euie8CDgD4e6NNWhYncXw"
}

So, scriptPath thinks the value is: QmYZMAGE5VVDDGpEj7wWQgQAyBM76Q31iSXm15D8YqR9Jd. The hash of readme.md (the script) is QmXU7gg17hyErNyo5Cr8ASJcbadFFEyMfMaaKLB7rYcj8R. Those are different. bad.

Because we've cloned this dataset from somewhere else, we don't have QmYZMAGE5VVDDGpEj7wWQgQAyBM76Q31iSXm15D8YqR9Jd in our local repo.

Finally, let's look at the contents of readme.md in that DAG:

$ ipfs cat QmaZJjtbSQVgeZSAamP4UpvVvDG7x7wHntAKSFbtxfssdU/readme.md
{"qri":"rm:0","scriptPath":"/ipfs/QmYZMAGE5VVDDGpEj7wWQgQAyBM76Q31iSXm15D8YqR9Jd"}

... that doesn't look like a readme file 🤷‍♀. Time to go digging around in base/dsfs/dataset.go.

We've run into this problem before with default viz hashes not getting pinned, and have added hacks like this to get around it:

https://github.com/qri-io/qri/blob/c21419b3a0df46e69c923ffcc8bcdbeb2690f31e/base/dataset.go#L45-L58

This is a pervasive problem within our stack that's now causing major UX problems. This problem has gotten worse in recent revisions, creeping into readme when it only used to be in viz.

Steps to fix:

b5 commented 4 years ago

landed a bunch of stopgap fixes to this in #1260, but this isn't closable until we have proper tests