jarvist / migovecsurveydata

Repository for the Tolminski Migovec exploration project Cave Survey data
7 stars 4 forks source link

Copy images out of google drive links into a separate directory #4

Closed goatchurchprime closed 4 years ago

goatchurchprime commented 4 years ago

There are a couple in this file: https://github.com/jarvist/migovecsurveydata/blob/master/migovecsurveydata/sysmig/level2/rockroll.svx

;*ref https://drive.google.com/open?id=0BxkZAUc06EK5OHZrd2xNRmtEXzg
;*ref https://drive.google.com/open?id=0BxkZAUc06EK5ekxhcUhaUE5HVHc

The first is available, and the second isn't. There has to be a better place for these images.

In the future it should be possible to handle binary blobs in git properly, and might be a feature available on http://cave-registry.org.uk/ soon.

jarvist commented 4 years ago

@clewingriffith - what do you reckon? These are your scans + extra referencing. I guess it makes sense to have them somewhere authoritative. Would all the PNGs over-weigh this survey repo? Shall we start a new one for images? Or put them on the caving club website in their own subdirectory in the photo_archive ?

clewingriffith commented 4 years ago

So, when I first added them, I think google drive was the only place that I thought was possible for a large number of largish binary files. Since then, I think github pages appeared, making that also a possible location for non-sensitive data that can be publicly available in readonly form. I wouldn't want to dump all these into the same repository as migovecsurveydata as it would make cloning a nightmare. But we could create a sister repo and put them in there. We would then have much more control over the structure and not have to use the awful hashed links that are needed for public access to google drive docs (and i've gone sour on google generally).

I will see if I can look at doing some sort of transitioning, and maybe at the same time, pick up the effort on scanning these in and uploading them. Seeing the original survey sheets is often the only way to resolve issues when somebody has decided to 'change' the data after the fact. I would probably keep survey data separate from photos.

On Wed, 2 Oct 2019, 23:51 Jarvist Moore Frost, notifications@github.com wrote:

Assigned #4 https://github.com/jarvist/migovecsurveydata/issues/4 to @clewingriffith https://github.com/clewingriffith.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/jarvist/migovecsurveydata/issues/4?email_source=notifications&email_token=AADFMVRQT2BOGSUELUIIQ53QMUQXZA5CNFSM4I3X3QJKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOT7QHVQQ#event-2682288834, or mute the thread https://github.com/notifications/unsubscribe-auth/AADFMVR5M3UHLJAG4LYEMD3QMUQXZANCNFSM4I3X3QJA .

goatchurchprime commented 4 years ago

The http://cave-registry.org.uk/ was set up to host these sorts of things. It's svn, which is better at handling binary blobs and is more caver-proof than git (only has checkin and checkout and no branches).

Subject to the problem of cavers, the right answer is probably git-large-file-storage, since pure git is really for text files. I heard @wookey or @andrewatkinson was supposed to be looking into this, but I don't know how far it's got.

You then have the style of separating out the rawscans from the survey data into two separate directory trees, so you don't have to download the big stuff to get the 3D data, like here: http://cave-registry.org.uk/svn/NorthernEngland/ThreeCountiesArea/

Or you can intermingle it and put all the data from each cave into a single directory including all the fat raw scans, so that the 3D survey data for the entire area is not separable, like here: http://www.cave-registry.org.uk/svn/CheddarCatchment/ManorFarm/

The cave-registry has password protected areas.

CUCC-Austria puts all its survey data behind a password protected directory in git/mercurial, but the scans are all in an accessible directory, which can be rsync-ed as necessary. http://expo.survex.com/expofiles/surveyscans/2018/2018%2337/notes-1.jpg

There is no right answer, and the best option is to look at how everyone else is doing things, and pick which one seems to be working in the long term.

jarvist commented 4 years ago

I think a second Git(Hub) repo makes the most sense. I think they used to be vaguely limited to 1GB. We should probably figure out what sort of file size is sufficient for the scans & confirm that we'll end up with something sensible in the limit that we scan everything. It would be nice to have a ImageMagick convert script that downsamples nicely to whatever compressed PNG / JPEG resolution and settings works.

The scanned files very nicely lend themselves to a hierarchal file structure, e.g. /2018/nameofsurvey_page1.png etc.

I'm not sure what makes most sense for the 'link' from the Survex .svx files. Does it expect a URL? Github pages would be nice because then the URL is fairly short + clickable, without the .git SHA hash. But perhaps just a full file-name from the root of the survey-scans repo is most future proof. (But not instantly clickable.)

goatchurchprime commented 4 years ago

In CUCC-austria we have a ";ref 2008#45" which tells us what year and wallet the scans are in. (On the computer it is actually in the directory "rawscans/2008/2008#45").

These directories -- one-per-trip -- include the underground notes and the clean drawn up transcriptions that are used for tracing into the computer. It's not a hard-link in the survex file, and it's more for a reference to the where the work is done.

Each numbered wallet gets the underground notes first, we print out blank survey centrelines and store them in there till someone draws them up. Scanning happens progressively.

So my recommendation is one directory per year, with subdirectories for the trips (because there are numerous scans that need to be kept together). Use jpg, not png (you don't notice the quality). Scans of underground notes should be made in colour so you can tell the brown mud apart from the grey pencil marks.

Oh, and when people use paperless surveying, I like to print out the drawings onto paper in their own wallet as will as keep a bitmapped rendering of the drawing in one of these directories so you can keep track of things. There's usually a lot of shuffling round and going through the pile at the end of the year to make sure nothing has gone missing and everyone has done their drawing up work. These printouts of the toporobot drawings serve as a place-holder, and you never know what could go wrong.

Put this tree of directories of scans anywhere you can copy it from. The tunnelx sketch-files have semi-relative pathnames embedded in it so you can pull up the scan of the redrawn survey under any bit of passage. Notice how all the scans are cockeyed by the declination, since there was no fix in any of these deep fragments of cave to inform the declination auto when they were rendered.

image

wookey commented 4 years ago

On 2019-10-03 15:31 +0000, Julian Todd wrote:

Subject to the problem of cavers, the right answer is probably git-large-file-storage, since pure git is really for text files. I heard @wookey or @AndrewAtkinson was supposed to be looking into this, but I don't know how far it's got.

I was planning to use git-annex, but git-lfs is much the same thing I think.

CUCC-Austria puts all its survey data behind a password protected directory in git/mercurial but the scans are all in an accessible directory, which can be

There is no password hiding access to CUCC data - it's in the open, same as the scans.

But obviously a group can do whatver it sees fit.

Wookey -- Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/

AndrewAtkinson commented 4 years ago

Issue is now fixed thank you

clewingriffith commented 4 years ago

I've copied all the scans to a repo clewingriffith/migovecsurveybooks and repointed all the references at the github-pages links. There weren't any broken links, but it's possible that something was up with the way some of them were shared. They were always structured, but this wasn't visible until now. In practice, git doesn't really have a problem with JPGs -- the whole debate about git lfs and binaries only really comes into play if you are making frequent changes to binaries, or if they are actually large, and 1Mb isn't really large.