[Question] How Nomos-v2 is collected? Is there a benchmark on performance?

Luciennnnnnn commented 1 month ago

Hi, I'm curious about how Nomos-v2 is collected? What's more, is it better than other datasets? Is there a benchmark on performance?

neosr-project commented 1 month ago

Hi Lucien, Nomos-v2 was collected mostly manually, but also with the help of HyperIQA, based on the other datasets mentioned on readme.

is it better than other datasets?

We didn't do a fair comparison between datasets since we lack computational power, but because one of the main points of Nomos-v2 was to select only good quality and complex images, we expect it to have at the very least equal performance to bigger datasets. @Phhofm has trained multiple models on the Nomos-v2 dataset and it seems to achieve perceptually better results than DIV2k or LSDIR. He has been testing some new approachs to automated dataset distillation as well, if you're interested. Let me know if you have any other quesiton :+1:

Luciennnnnnn commented 1 month ago

Hi @muslll, that's very interesting! I have give it a try to train a Stable Diffusion based model, however, the performance of Nomos-v2 trained model is bad than LSDIR trained model (I use 20% images in LSDIR) in most metrics. By the way, I'm also interested in creating a high-quality dataset for image super-resolution, as it's crucial in many scenarios. I have substantial computational resources available. If you're interested in collaborating, please let me know.

@Phhofm Regarding automated dataset distillation, are you evaluating image complexity using SAM? I noticed the issue you created in the DiverSeg-Dataset repository.

Phhofm commented 1 month ago

Hey, yeah I might use SAM Im currently cooking up another showcase, or basically curation the HQ-50K dataset while preparing to create a youtube video with it, since the youtube video about dataset preparation i once did is like a year old already.

My current workflow looks more like this

Anyway im still working on it, currently on the blockiness step. Then I'd like to release it and a youtube video with it to update the old one so to say.

At the same time I also want to create a big tiled hq sisr dataset this way. This HQ-50K curation is just one step in that basically.

I just realized I also did not have LR creation in the steps yet. Plus some steps I use are situational, like using multiscaling, I only use it with initially small datasets. It basically doubles or triples the amount of images in the dataset (blows it up).

But yeah most of what I do in this topic is as a hobby / for fun, because I find it fascinating. I dont do it in an academic manner, which there would still be a lot of things to do. Like empirically show what influence multiscaling would even have. One could just use DIV2K, prepare it with multiscaling aswell, deterministically train two models in the same manner and score the outputs. Or then in general, train multiple network models in the same deterministic manner on the same set, then score them with multiple full reference metrics, together with their inference times and vram requirements, with visual outputs, that would be way more useful then the current situation where we just have papers that give some psnr/ssim on some datasets with their official pretrains that are not trained deterministically. And a lot of other things.

Ah I think diffusion based model can make use of way more data, or use bigger dataset to train than most of our sisr networks. Compact collapses if the dataset is too complex (im talking degradations), while a big network like atd can handle it well, or that was the case with my realwebphoto datasets. So I wanted to see if I could make a big dataset in the end and train like a drct-l model. Was just thinking of why your sd based model would perform worse, thought im a bit surprised that 20% of LSDIR would perform better than Nomosv2. Its simply that I had seen huge datasets being used for sd trained things, while we sometimes used 3k for small networks and 6k for medium (or big) networks and still got good results.

neosr-project commented 1 month ago

I have give it a try to train a Stable Diffusion based model, however, the performance of Nomos-v2 trained model is bad than LSDIR trained model (I use 20% images in LSDIR) in most metrics.

Interesting. We normally train CNN or Transformer, so nobody has tried it on diffusion yet, afaik. I'd say this is likely due to it being cropped to 512x512px, which would hurt image generation (in contrast to SISR) since context could be lost.

If you're interested in collaborating, please let me know.

Sure, let me know if you want to start a new project, we could collaborate on proving it is superior to other standard datasets, neosr makes it easy to prove by using determinism, something neglected by most research papers. Efficient networks like SPAN could be used in such test, on a A100 it could take only a few hours probably.

Luciennnnnnn commented 1 month ago

I'd like to clarify that my diffusion based model is like StableSR, which is trained with Real-ESRGAN degradation, and test on both synthetic and Real-world dataset (RealSR and DRealSR). Since train/test distribution dismatch, this setting may not very suitable to verify Nomos-v2.

I'll try it with cnn/transformer architecture on bicubic setting later.

I'd say this is likely due to it being cropped to 512x512px, which would hurt image generation (in contrast to SISR) since context could be lost.

I think cropping is fine, since I also crop 512 * 512 patches from LSDIR for training.

Luciennnnnnn commented 1 month ago

@Phhofm I think multi-scaling is beneficial even for large dataset, since it diversify scales of objects in single patches, and test images have different image scale.

What's more, I recommend Q-Align as a no-reference metric, which is strong on benchmarks. And I have tested it, it works well in eliminating bad images in LSDIR.

neosr-project commented 1 month ago

I think cropping is fine, since I also crop 512 * 512 patches from LSDIR for training.

I see, so that is not the issue :thinking: Maybe it doesn't work well for diffusion for some reason, like I said it was built by testing on cnn/transformer only. But let me know if you make any new tests, I'm interested in the results.

I think multi-scaling is beneficial even for large dataset

I think there's a trade-off here. If too much multi-scaling is applied (for example on all the dataset, with 2x, 4x, 8x scales), it could increase the succeptibility to overfitting, since it would repeat the 'same' image multiple times. So using it on a percentage of the dataset instead (like 25%-50%) would probably be better.

Phhofm commented 1 month ago

PS @Luciennnnnnn yeah I am currently trying out qalign by usign the pyiqa inference script and using the 8bit option there so i ran run it, and it seems to work great. It is quite a bit slower though than other options (like topiq_nr for example); to test it out I tiled LSDIR into 179’006 512x512px tiles, and it takes me around 22 hours to score those with qalign_8bit.

On the HQ50K Dataset I used a qalign_8bit cutoff score of 4 on the tiles (so removed all tiles that scored below 4 from the dataset) Also trying out using a complexity scoring, using a cutoff score of 0.3 or 0.4 on that dataset: Dataset_prep_complexity_part.pdf

Bezdarnost commented 3 weeks ago

If you're interested in collaborating, please let me know.

Hi! I'd be happy to collaborate with you. Could you please share your Gmail or other contact details? I'm currently in the final stages of my paper for CVPR '25, so I’ll be available after the submission deadline (November 15). Does that work for you?

neosr-project / neosr

[Question] How Nomos-v2 is collected? Is there a benchmark on performance? #86