issues
search
long8v
/
PTIR
Paper Today I Read
19
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[160] ALOHa: A New Measure for Hallucination in Captioning Models
#179
long8v
opened
2 weeks ago
0
[159] Long-CLIP: Unlocking the Long-Text Capability of CLIP
#178
long8v
opened
1 month ago
0
[158] A Mathematical Framework for Transformer Circuits
#177
long8v
opened
1 month ago
0
240507 add text span
#176
long8v
closed
1 month ago
0
feat: add text span
#175
long8v
opened
1 month ago
0
feat: add LeGrad
#174
long8v
opened
1 month ago
0
[157] LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
#173
long8v
opened
1 month ago
1
[156] Interpreting CLIP's Image Representation via Text-Based Decomposition
#172
long8v
opened
1 month ago
0
[155] Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
#171
long8v
opened
2 months ago
1
feat: llava next hf implementation
#170
long8v
opened
2 months ago
0
[154] Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
#169
long8v
opened
3 months ago
0
[153] Contrastive Explanations for Model Interpretability
#168
long8v
opened
3 months ago
0
🍅 짭짤이 논문 모아보기 (XAI)
#167
long8v
closed
1 month ago
8
🍅 짭짤이 논문 모아놓기 (VLM)
#166
long8v
closed
1 month ago
1
🍅 짭짤이 논문 모아놓기 (CLIP)
#165
long8v
closed
1 month ago
17
[152] Sigmoid Loss for Language Image Pre-Training
#164
long8v
opened
3 months ago
0
[151] FOIL it! Find One mismatch between Image and Language caption
#163
long8v
opened
4 months ago
0
[150] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
#162
long8v
opened
4 months ago
0
[149] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
#161
long8v
opened
4 months ago
1
[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
#160
long8v
opened
4 months ago
0
[147] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
#159
long8v
opened
4 months ago
2
[146] Transformer Interpretability Beyond Attention Visualization
#158
long8v
opened
4 months ago
0
[145] CLIPScore: A Reference-free Evaluation Metric for Image Captioning
#157
long8v
opened
4 months ago
1
[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
#156
long8v
opened
6 months ago
0
[143] Honeybee: Locality-enhanced Projector for Multimodal LLM
#155
long8v
opened
6 months ago
2
[142] Trust Region Policy Optimization
#154
long8v
opened
6 months ago
2
[141] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
#153
long8v
opened
6 months ago
0
[140] Improved Baselines with Visual Instruction Tuning
#152
long8v
opened
6 months ago
1
[139] Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation
#151
long8v
opened
6 months ago
0
[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
#150
long8v
opened
6 months ago
0
[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
#149
long8v
opened
7 months ago
1
[136] Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models
#148
long8v
opened
7 months ago
0
[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
#147
long8v
opened
7 months ago
1
[134] Asynchronous Methods for Deep Reinforcement Learning
#146
long8v
opened
8 months ago
0
[133] DataComp: In search of the next generation of multimodal datasets
#145
long8v
opened
9 months ago
0
[132] Hyperbolic Image-Text Representations
#144
long8v
opened
9 months ago
0
[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
#143
long8v
opened
9 months ago
1
[130] Segment Anything
#142
long8v
opened
10 months ago
1
[129] Grounding Language Models to Images for Multimodal Inputs and Outputs
#141
long8v
opened
10 months ago
0
[128] Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
#140
long8v
opened
10 months ago
0
[127] Linearly Mapping from Image to Text Space
#139
long8v
opened
10 months ago
0
[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
#138
long8v
opened
10 months ago
0
[125] RILS: Masked Visual Reconstruction in Language Semantic Space
#137
long8v
opened
11 months ago
0
feat: add sparse rcnn
#136
long8v
opened
11 months ago
0
[124] LiT: Zero-Shot Transfer with Locked-image text Tuning
#135
long8v
opened
12 months ago
0
[123] Robust fine-tuning of zero-shot models
#134
long8v
opened
12 months ago
0
[122] Neural Architecture Search without Training
#133
long8v
opened
1 year ago
0
[122]
#132
long8v
closed
1 year ago
1
[121] Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
#131
long8v
opened
1 year ago
0
feat: add open-clip
#130
long8v
opened
1 year ago
0
Next