Closed yusefnapora closed 8 years ago
I like it! List may be a good idea too, with names (when present) moving into the objects themselves
Noting this as a protocol-breaking change we should get out the door soon
This has been updated, with the getty stuff removed, and we now write out the asset dictionary even if we don't have the image data and can't create an ipfs link. In other words, if you run with --skip-image-downloads
, it will just output a dict like
{
"uri": "http://cache1.asset-cache.net/gc/JD6484-001-animal-supervision-gettyimages.jpg?v=1&c=IWSAsset&k=2&d=OA3owSIxYcZu6CTE0v2Nq7xkTFD5%2fH8AEc%2bV4t9URzTj3xVJ2%2bN47LqFBOaJ9O2H&b=NDk1",
"binary_asset": true,
"mime_type": "image/jpeg"
}
I changed the origin
field to uri
, since that seems like it makes more sense when there's no ipfs link present.
I'm planning to add back the special-casing of the thumbnail
key, which was removed in this branch a few commits ago. In other words, try to pull the image at read-time, either from ipfs or from the http uri if there's no ipfs link. If successful, hand off the base64 image data to the indexer. This will be compatible with the current indexer code, which expects a thumbnail_base64
key containing the image data.
At that point, I feel like we can go ahead and merge this to prepare for a bulk re-ingestion of the getty data (with --skip-image-downloads
).
However, I'm also going to write up how we best handle image caching and adding ipfs links after the fact, which seems like it needs its own issue.
I'm planning to add back the special-casing of the thumbnail key, which was removed in this branch a few commits ago
I like the generic images with optional "thumbnail: true" property, is there some reason we don't like that?
I like the generic images too; I added the special casing back because we seemed to be in a rush to get images into the indexer, and the quickest way was to give it the output it was already expecting. This branch still supports generic images (like the snippet in the first comment), but it looks for an object at the thumbnail
key and outputs thumbnail_base64
, since that's where the indexer is currently looking.
I'm planning to replace this with "two step" image fetching, so consumers of the canonical_stream
don't need to pay the download cost. But that needs indexer support, so need to coordinate it with @autoencoder
Going to merge this so we can re-ingest on the testnet
This changes the way we write out image links. Instead of just writing the link directly, it's embedded in a wrapper object that has
binary_asset: true
set, and also has the origin url and mime type (if possible).I changed the getty translator and iterator to output all images it can find (so thumb, comp, and preview), all stored in the
meta.data.images
map. so e.g:It might make more sense to use a list instead of a map, to allow unnamed items, etc. That will need some changes to the writer code tho.
This also adds an
.open
method to theIpfsDatastore
andRpcDatastore
classes, which returns a file-like object with the contents of the returned object.I haven't updated the reader api yet, except to remove the old
thumbnail_base64
code. This means that field is no longer present, so if the indexer is counting on it existing, this may break things.The indexer could potentially use this new format as-is by walking the
meta.data
dict and looking forbinary_asset
s. But it may also be worth updating the reader api to do that and collect the binary links in one place for easier iteration.