kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
http://kdexd.xyz/virtex
MIT License
557 stars 61 forks source link

run on single input image #1

Closed nikky4D closed 4 years ago

nikky4D commented 4 years ago

Hi,

I would like to evaluate your work on a single image for image captioning. Can you tell me the steps I should follow for a single input? For instance, given a folder of images, how would I use your model for inference only on the folder of images?

Looking at captioning-task from your description, I am not sure how to go about using my own dataset for evaluation of the model.

Thanks

soumilkanwal80 commented 4 years ago

Did you figure this out?

kdexd commented 4 years ago

Hi @nikky4D and @soumilkanwal80 :

We don't support this feature out of the box. But it is possible with little hack in CocoCaptionsEvalDataset (https://github.com/kdexd/virtex/blob/master/virtex/data/datasets/downstream.py#L240-L277).

You could modify it to work with an image folder and use image filenames as keys.

On top of that, you can use https://github.com/kdexd/virtex/blob/master/scripts/eval_captioning.py to get results.

You can get started on this. In general it's a nice feature to have, I will add it in.

dzhelonkin commented 4 years ago

I tried to use custom dataset class:

from PIL import Image

class FileList(Dataset):
    r"""
    A dataset which provides only images (for inference) from the folder
    dataset.
    Parameters
    ----------
    files: list, required
        list of paths to the images
    image_tranform: Callable, optional (default = virtex.data.transforms.DEFAULT_IMAGE_TRANSFORM)
        A list of transformations, from either `albumentations
        <https://albumentations.readthedocs.io/en/latest/>`_ or :mod:`virtex.data.transforms`
        to be applied on the image.
    """

    def __init__(
        self,
        files,
        image_transform: Callable = T.DEFAULT_IMAGE_TRANSFORM,
    ):
        self.files = files
        self.image_transform = image_transform

    def __len__(self):
        return len(self.files)

    def __getitem__(self, idx: int):

        image_id, image = idx, np.array(Image.open(self.files[idx]))
        image = self.image_transform(image=image)["image"]
        image = np.transpose(image, (2, 0, 1))

        return {
            "image_id": torch.tensor(image_id).long(),
            "image": torch.tensor(image),
        }

but faced with another problem: OSError: Not found: "datasets/vocab/coco_10k.model": No such file or directory Error #2

@kdexd Could you provide sentencepiece pretrained models?

kdexd commented 4 years ago

SentencePiece vocab and model and be generated in a few seconds by a simple command (requires COCO train2017 captions): https://kdexd.github.io/virtex/virtex/usage/setup_dependencies.html#preprocess-data

Vocab/model generation will be deterministic, given you use the same annotations.

dzhelonkin commented 4 years ago

It really turned out to be very simple, thank you.

nikky4D commented 4 years ago

It really turned out to be very simple, thank you.

Can you give me a sample of the code you used and your setup?

dzhelonkin commented 4 years ago

It really turned out to be very simple, thank you.

Can you give me a sample of the code you used and your setup?

Setup

Ubuntu 18.04 and installed all the required dependencies for this repo.

Code

New main function in scripts/eval_captioning.py:

import glob
from virtex.data import FileList

def main(_A: argparse.Namespace):
    if _A.num_gpus_per_machine == 0:
        # Set device as CPU if num_gpus_per_machine = 0.
        device = torch.device("cpu")
    else:
        # Get the current device (this will be zero here by default).
        device = torch.cuda.current_device()

    _C = Config(_A.config, _A.config_override)
    tokenizer = TokenizerFactory.from_config(_C)

    files = glob.glob('path/to/images/*jpg')
    val_dataloader = DataLoader(
        FileList(files),
        batch_size=_C.OPTIM.BATCH_SIZE,
        num_workers=_A.cpu_workers,
        pin_memory=True,
    )
    # Initialize model from a checkpoint.
    model = PretrainingModelFactory.from_config(_C).to(device)
    ITERATION = CheckpointManager(model=model).load(_A.checkpoint_path)
    model.eval()

    for val_iteration, val_batch in enumerate(val_dataloader, start=1):
        for key in val_batch:
            val_batch[key] = val_batch[key].to(device)

        # Make a dictionary of predictions in COCO format.
        with torch.no_grad():
            output_dict = model(val_batch)

        for image_id, caption in zip(
            val_batch["image_id"], output_dict["predictions"]
        ):
            print(files[image_id], tokenizer.decode(caption.tolist()))

Also change virtex/data/__init__.py for importing FileList class:

from .datasets.captioning import CaptioningDataset
from .datasets.multilabel import MultiLabelClassificationDataset
from .datasets.downstream import (
    ImageNetDataset,
    INaturalist2018Dataset,
    VOC07ClassificationDataset,
    CocoCaptionsEvalDataset,
    FileList,
)

__all__ = [
    "CaptioningDataset",
    "MultiLabelClassificationDataset",
    "CocoCaptionsEvalDataset",
    "ImageNetDataset",
    "INaturalist2018Dataset",
    "VOC07ClassificationDataset",
    "FileList",
]
kdexd commented 4 years ago

Looks very neat, glad you got this working! I will add this feature by end of week.

kdexd commented 4 years ago

I added this feature in master! Main additions are ImageDirectoryDataset and its usage in scripts/eval_captioning.py.

Refer updated instructions here:

  1. Image Captioning on COCO Captions val2017
  2. Running Image Captioning Inference on Arbitrary Images

Closing this issue for now. Feel free to re-open for any questions or issues!

nikky4D commented 4 years ago

Thanks so much everyone.

freeIsa commented 3 years ago

Hi there, I am also trying to run captioning on a folder of sample images on my machine. After generating the coco_10k.vocab file and correctly set the path for the model & config file in the example command line, I ran the command at the bottom of the documentation page: https://kdexd.github.io/virtex/virtex/usage/downstream.html but I got the following error:

  File "scripts/eval_captioning.py", line 113, in <module>
    main(_A)
  File "scripts/eval_captioning.py", line 86, in main
    "image_id": image_id.item(),
AttributeError: 'str' object has no attribute 'item'

Can you please help me figure out what is wrong with my process? Thank you!

kdexd commented 3 years ago

@freeIsa: Oops, I think you encountered an edge case — your image file names may be alphabet characters (not numbers). I have handled this edge case, please pull from master! Let me know if you face any issues :-)

freeIsa commented 3 years ago

Thank you @kdexd, now it's working! 🎉

08tjlys commented 3 years ago

I added this feature in master! Main additions are ImageDirectoryDataset and its usage in scripts/eval_captioning.py.

Refer updated instructions here:

  1. Image Captioning on COCO Captions val2017
  2. Running Image Captioning Inference on Arbitrary Images

Closing this issue for now. Feel free to re-open for any questions or issues!

hi, @kdexd i followed the newest instructions for running Image Captioning Inference on my own images, but i also met such error when running eval_captioning.py, did i miss something before running script?
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) OSError: Not found: "datasets/vocab/coco_10k.model": No such file or directory Error #2

kdexd commented 3 years ago

Hi @08tjlys , please follow Step 1 here: http://kdexd.xyz/virtex/virtex/usage/setup_dependencies.html#preprocess-data