julianh2o / Podbase

Image database aimed at scientific applications
MIT License
2 stars 0 forks source link

Priority 4: Shorten image URL using search by name #55

Open danhartline opened 11 years ago

danhartline commented 11 years ago

The problem: current addressing for an image in the browser window is unwieldy:

a href=' http://podbase2.pbrc.hawaii.edu/entry/10#/InvertebrateMyelin/tem/Astacidea/P.clarkii/c6vnc/Grid07-04_D8/JHK070718_C6VNC_Grid07-04_D8_slide34c_TEM04s.jpg'>JHK070718_C6VNC_Grid07-04_D8_slide34c_TEM04s.jpg</a>

It is even worse (7 lines each) for displaying the image itself in an HTML summary document (="paper" as Podbase is being developed), e.g.:

<a href='http://podbase2.pbrc.hawaii.edu/data/InvertebrateMyelin/tem/Astacidea/P.clarkii/c6vnc/Grid07-04_D8/JHK070718_C6VNC_Grid07-04_D8_slide34c_TEM04s.jpg'><img src=' http://podbase2.pbrc.hawaii.edu/data/InvertebrateMyelin/tem/Astacidea/P.clarkii/c6vnc/Grid07-04_D8/JHK070718_C6VNC_Grid07-04_D8_slide34c_TEM04s.jpg?mode=fit&width=300&brightness=0&contrast=0'></a>

The feature needed is a less cumbersome way (2-3 lines) for browsing and displaying images via the hypertext links than I currently have to use in my HTML summary pages (which eventually will be the "paper" HTML output). HOWEVER, the current ability to browse files in the context of the folder structure is important to retain, as it is key to finding related images, however weirdly they may happen to be named.

Could a URL like this work (2 lines would be better than 3)? I don't know what the constraints might be on format, but something like the following would be really convenient & readable:

<a href='http://podbase.net/browse= JHK070718_C6VNC_Grid07-04_D8_slide34c_TEM04s.jpg'></a>

and

<a href='http://podbase.net/display= JHK070718_C6VNC_Grid07-04_D8_slide34c_TEM04s.jpg?300&0&0'></a>
danhartline commented 11 years ago

Although rare, it appears that there are SOME images with non-unique names, such as:

http://podbase2.pbrc.hawaii.edu/entry/10#/InvertebrateMyelin/tem/Caridean/C.septem/Tina/pt/Unstained/TEM01s.jpg

and

podbase2.pbrc.hawaii.edu/entry/10#/InvertebrateMyelin/tem/Caridean/C.septem/Tina/pt/Unstained/more unstained/TEM01s.jpg

The problem is that they were created before we instituted the unique naming system. Could a conflict of this sort be detected and the user be informed so they could rename the image? If this situation is wide-spread (likely only with images named "Tina" somewhere in the directory), could a renaming script be implemented in which the name of the next folder up one level would be added to the image name? So for example the second of these would be renamed as "more%20unstained-TEM01s.jpg"

julianh2o commented 11 years ago

I can write a script to detect duplicate image names for you.

On Tue, Jul 2, 2013 at 4:58 PM, danhartline notifications@github.comwrote:

Although rare, it appears that there are SOME images with non-unique names, such as:

http://podbase2.pbrc.hawaii.edu/entry/10#/InvertebrateMyelin/tem/Caridean/C.septem/Tina/pt/Unstained/TEM01s.jpg

and

podbase2.pbrc.hawaii.edu/entry/10#/InvertebrateMyelin/tem/Caridean/C.septem/Tina/pt/Unstained/moreunstained/TEM01s.jpg

The problem is that they were created before we instituted the unique naming system. Could a conflict of this sort be detected and the user be informed so they could rename the image? If this situation is wide-spread (likely only with images named "Tina" somewhere in the directory), could a renaming script be implemented in which the name of the next folder up one level would be added to the image name? So for example the second of these would be renamed as "more%20unstained-TEM01s.jpg"

— Reply to this email directly or view it on GitHubhttps://github.com/julianh2o/Podbase/issues/55#issuecomment-20387396 .

julianh2o commented 11 years ago

I've created a script to detect duplicates and run it on your image folder. Here are the results:

https://dl.dropboxusercontent.com/u/3334782/duplicates.txt

Please review this briefly and let me know if you want me to work on a way to automatically rename them for you or if you'd like to handle it yourself. Either way, it shouldn't be a huge barrier to this bug. I can include a small amount of the hash perhaps to identify it in the case of multiple images.

danhartline commented 11 years ago

Hi, Julian,

 Thanks for the list!  It is rather long!!  A good part of it, at

least, is actually due to the fact that Frederic (and maybe some others) have indeed placed duplicates of the images in different locations. However, there are also a number of images that are indeed different. It is a long list, so it will take me a while to figure out what is what.

Thanks again!

Dad

I've created a script to detect duplicates and run it on your image folder. Here are the results:

https://dl.dropboxusercontent.com/u/3334782/duplicates.txt

Please review this briefly and let me know if you want me to work on a way to automatically rename them for you or if you'd like to handle it yourself. Either way, it shouldn't be a huge barrier to this bug. I can include a small amount of the hash perhaps to identify it in the case of multiple images.


Reply to this email directly or view it on GitHub: https://github.com/julianh2o/Podbase/issues/55#issuecomment-20487100

Daniel K. Hartline Research Professor and Director Bekesy Laboratory of Neurobiology University of Hawaii at Manoa Honolulu, HI 96822 www.pbrc.hawaii.edu/~danh

julianh2o commented 11 years ago

Would you like me to do a duplicate image analysis as well? (as opposed to duplicate names). I can use the hashing technique that I described to you to figure out whether the images are equivalent or just named the same.

On Fri, Jul 5, 2013 at 11:59 AM, danhartline notifications@github.comwrote:

Hi, Julian,

Thanks for the list! It is rather long!! A good part of it, at least, is actually due to the fact that Frederic (and maybe some others) have indeed placed duplicates of the images in different locations. However, there are also a number of images that are indeed different. It is a long list, so it will take me a while to figure out what is what.

Thanks again!

Dad

I've created a script to detect duplicates and run it on your image folder. Here are the results:

https://dl.dropboxusercontent.com/u/3334782/duplicates.txt

Please review this briefly and let me know if you want me to work on a way to automatically rename them for you or if you'd like to handle it yourself. Either way, it shouldn't be a huge barrier to this bug. I can include a small amount of the hash perhaps to identify it in the case of multiple images.


Reply to this email directly or view it on GitHub: https://github.com/julianh2o/Podbase/issues/55#issuecomment-20487100

Daniel K. Hartline Research Professor and Director Bekesy Laboratory of Neurobiology University of Hawaii at Manoa Honolulu, HI 96822 www.pbrc.hawaii.edu/~danh

— Reply to this email directly or view it on GitHubhttps://github.com/julianh2o/Podbase/issues/55#issuecomment-20534050 .

danhartline commented 11 years ago

That would be a big help! I was dreading the task of going through your list image by image to sort that out. How much more hassle would that be? If I know which images are exact duplicates, I can delete one of the two on the server without too much trouble. Then if we know which images are intrinsically different, we can probably come up with a straightforward renaming system (like just adding the folder name in front of the image name).

D

Would you like me to do a duplicate image analysis as well? (as opposed to duplicate names). I can use the hashing technique that I described to you to figure out whether the images are equivalent or just named the same.

On Fri, Jul 5, 2013 at 11:59 AM, danhartline notifications@github.comwrote:

Hi, Julian,

Thanks for the list! It is rather long!! A good part of it, at least, is actually due to the fact that Frederic (and maybe some others) have indeed placed duplicates of the images in different locations. However, there are also a number of images that are indeed different. It is a long list, so it will take me a while to figure out what is what.

Thanks again!

Dad

I've created a script to detect duplicates and run it on your image folder. Here are the results:

https://dl.dropboxusercontent.com/u/3334782/duplicates.txt

Please review this briefly and let me know if you want me to work on a way to automatically rename them for you or if you'd like to handle it yourself. Either way, it shouldn't be a huge barrier to this bug. I can include a small amount of the hash perhaps to identify it in the case of multiple images.


Reply to this email directly or view it on GitHub: https://github.com/julianh2o/Podbase/issues/55#issuecomment-20487100

Daniel K. Hartline Research Professor and Director Bekesy Laboratory of Neurobiology University of Hawaii at Manoa Honolulu, HI 96822 www.pbrc.hawaii.edu/~danh

— Reply to this email directly or view it on GitHubhttps://github.com/julianh2o/Podbase/issues/55#issuecomment-20534050 .


Reply to this email directly or view it on GitHub: https://github.com/julianh2o/Podbase/issues/55#issuecomment-20534148

Daniel K. Hartline Research Professor and Director Bekesy Laboratory of Neurobiology University of Hawaii at Manoa Honolulu, HI 96822 www.pbrc.hawaii.edu/~danh

julianh2o commented 11 years ago

Here Dan, this contains both sections. At the top "Identical images" are the ones that look the same. Under that, "Duplicate name" indicates items that have the same name as another.

https://dl.dropboxusercontent.com/u/3334782/duplicate_images_and_names.txt

julianh2o commented 11 years ago

Also, I've removed the invisible files from the report.

danhartline commented 11 years ago

Hi, Julian,

 This is immensely helpful!  It looks at first glance as if whole

folders (or parts of them) in the project "Copepod" are identical to those in the project InvertebrateMyelin folders. Unfortunately the duplication isn't consistent. However, it might still be that we can delete whole folders of duplicated images and then manually amalgamate the rest.

 D

PS We seem to be missing the metadata on these images -- which limits their usefullness. Might just dump the whole lot.

Here Dan, this contains both sections. At the top "Identical images" are the ones that look the same. Under that, "Duplicate name" indicates items that have the same name as another.

https://dl.dropboxusercontent.com/u/3334782/duplicate_images_and_names.txt


Reply to this email directly or view it on GitHub: https://github.com/julianh2o/Podbase/issues/55#issuecomment-20577953

Daniel K. Hartline Research Professor and Director Bekesy Laboratory of Neurobiology University of Hawaii at Manoa Honolulu, HI 96822 www.pbrc.hawaii.edu/~danh

danhartline commented 11 years ago

Hi, Julian,

 I was thinking about the duplicate images/names issue and it occurs

to me that the only real problem with duplicate IMAGES is that the metadata might not get linked to both images. Might that be not too big a problem if any metadata found could be copied (or appended) to the extra copy? Or 1 file of metadata shared by 2 (identical) images? The only REAL problem is if 2 identical names exist belonging to DIFFERENT images. Might your script be modified to check that? It seems just by looking over the list of duplicate names that most of those are cases where folks put the same image into two different folders (which is annoying -- they were not supposed to do that, but it's too late now). However, there are some bonafide cases where someone just named an image like "TEM10" since that name was unique for that particular folder. I would think that an auto-rename would be OK there -- I have not used too many such images in my HTML summary files, so I can probably deal with fixing those myself if they were renamed.

 Thanks,

 Dad

Also, I've removed the invisible files from the report.


Reply to this email directly or view it on GitHub: https://github.com/julianh2o/Podbase/issues/55#issuecomment-20577957

Daniel K. Hartline Research Professor and Director Bekesy Laboratory of Neurobiology University of Hawaii at Manoa Honolulu, HI 96822 www.pbrc.hawaii.edu/~danh

julianh2o commented 11 years ago

Because of the way that projects and access works.. Sharing the metadata will have some problems. If an image is shared which template should it inherit? Who can edit it? How do we inform the user that the metadata is shared? In general, it seems like a bad idea to maintain duplicate images in your data and I would advise against it.

It seems to me that images should have been taken in association with one project and potentially mixed together when creating a paper. You shouldn't need to copy images from one project to another, instead, make a paper that relies on two different projects.

On Sun, Jul 7, 2013 at 11:28 PM, danhartline notifications@github.comwrote:

Hi, Julian,

I was thinking about the duplicate images/names issue and it occurs to me that the only real problem with duplicate IMAGES is that the metadata might not get linked to both images. Might that be not too big a problem if any metadata found could be copied (or appended) to the extra copy? Or 1 file of metadata shared by 2 (identical) images? The only REAL problem is if 2 identical names exist belonging to DIFFERENT images. Might your script be modified to check that? It seems just by looking over the list of duplicate names that most of those are cases where folks put the same image into two different folders (which is annoying -- they were not supposed to do that, but it's too late now). However, there are some bonafide cases where someone just named an image like "TEM10" since that name was unique for that particular folder. I would think that an auto-rename would be OK there -- I have not used too many such images in my HTML summary files, so I can probably deal with fixing those myself if they were renamed.

Thanks,

Dad

PS Niko & Kara head back from Rio tomorrow. They apparently had a great time, so Mom is already scheming on a trip that would include you, if you are interested (you'd need to straighten out your Brazilian citizenship, as we've mentioned already, which is a bit of a hassle, I know)

Also, I've removed the invisible files from the report.


Reply to this email directly or view it on GitHub: https://github.com/julianh2o/Podbase/issues/55#issuecomment-20577957

Daniel K. Hartline Research Professor and Director Bekesy Laboratory of Neurobiology University of Hawaii at Manoa Honolulu, HI 96822 www.pbrc.hawaii.edu/~danh

Reply to this email directly or view it on GitHubhttps://github.com/julianh2o/Podbase/issues/55#issuecomment-20587723 .